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PREFACE 


If the instruction of students is to keep pace with the rapid develop¬ 
ment of statistical science, frequent publication of books based on the 
most recent knowledge in the field is required. Therefore, the author’s 
primary aim is to supply students with a book that is built on the recent 
advances in statistical theory and practice. Because they approach the 
study of statistics with different interests, aims, and backgrounds, it is 
not feasible to write one text that can meet the requirements of all 
classes of students. Whatever their approach, these students will have 
one thing in common, namely, they will need to acquire a thorough func¬ 
tional understanding of statistical principles to make intelligent use of 
statistics. The differences in interpretation of authors of statistical texts 
as to what functional understanding involves seem to range between the 
belief that statistical principles are working rules to be learned as quickly 
as possible for their utilitarian value and the conviction that an advanced 
knowledge of pure mathematics is the first requisite for the exposition 
of statistical principles. 

The author does not believe that either of these points of view is best 
for most of the students who need statistical training for their work. 
The former is likely to lead to blind, rule-of-thumb application of sta¬ 
tistical formulas; the latter is indispensable only for those who are to 
become professional statisticians or mathematical statisticians. Neither 
practical applications nor mathematical analysis is excluded from this 
book. In fact, problems have been used abundantly to illustrate prin¬ 
ciples or results. Also, a number of problems have been inserted whereby 
the student may test his understanding of the statistical theory. The 
author is convinced that the detailed working through of problems is 
fundamental to a functional understanding of statistical techniques. 
Similarly, application of the principles underlying the design of experi¬ 
mental or observational projects is necessary if a thorough grasp of these 
principles is to be secured. Experiences in application are necessary for 
the student if he is to design effective experiments of his own or to evalu¬ 
ate those of others. However, the problems are considered as auxiliaries 
to the study of the principles. 

Again, the mathematical analysis is not excluded because without 
mathematics there could be no serious study of statistics. But mathe¬ 
matics has been viewed as the servant and not as the master. The 
question of how much knowledge of mathematics ought to be assumed 
is difficult to answer. Not many students in the social and biological 
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sciences have a knowledge of calculus. It is the view of the author that 
the.student should have at least a background in calculus to be able to 
follow the theoretical material, which cannot be advantageously treated 
without considerable use of calculus. The understanding of statistical 
»as well as other scientific principles is relative, dependent upon what 
intelligence, technical background, and experience the student may have. 
Students with more knowledge of mathematics usually gain more com¬ 
plete understanding of the mathematical formulation underlying a par¬ 
ticular statistical principle. They should, therefore, have the oppor¬ 
tunity of utilizing their more adequate preparation. However, the 
number of students with special mathematical training is very limited. 
But the student with, for example, no calculus, may omit the few sec¬ 
tions of the book in which the calculus is used. Even without these 
sections, he should be able to acquire a considerable and continuous 
knowledge of the essentials of statistics from the non-technical, logical 
treatment accorded to most of the essentials in the book. 

It should also be explained where the book starts and where it ends. 
This book does not start from the very beginning of its subject. Many 
upper-class students and most graduate students have had an introduc¬ 
tion to statistics, usually called descriptive statistics, dealing with the 
elementary processes in the reduction of data. Such preliminary train¬ 
ing is assumed. If the student does not have it, the instructor may 
prefer to begin the subject by laying this elementary base himself. 
This book deals with the principal objective of statistics, which is to 
provide indispensable tools and methods for designing and executing 
experimental and other observational projects and for analyzing and 
interpreting the results. 

It would be advantageous to allot a full academic year to the objec¬ 
tives of statistical methods as presented here. However, adjustment 
can be made when less time is available by selecting certain portions 
regarded as most fundamental by the instructor. 

It was decided to bring the book to a close when its purpose was 
accomplished, that is, after the common principles of statistics had been 
investigated. The aim was not to present topics of interest to a few 
students only. 

The book is based on the content of a year's course in statistics, and 
was developed over a period of approximately ten years, primarily for 
graduate students in education and in psychology. During this time, 
content and method were continuously revised in light of experiences and 
scientific development of the subject. 

The author considers himself especially fortunate in having been a 
volunteer worker at the Galton Laboratory for a year, during which 
time he studied with Professor R. A. Fisher, foremost in laying the founda¬ 
tions of modern statistical methods. During this period he also profited 
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from the lectures of and conferences with Professors J. Neyman and 
Egon S. Pearson. 

The chief sources of information and help in developing the book harve 
been the many serious students whose criticisms and reactions were of 
inestimable value in attaining a clearer presentation of statistical methods. 
Most significant was the help from my very capable assistants. Among 
these should be especially mentioned Dr. Cyril Hoyt, Dr. Fei Tsao, Dr. 
Garland Kyle, and Mr. Stanley Clark, all of whom have made direct 
contributions to this work. 

I am greatly indebted to Dr. Robert W. B. Jackson of the Uni¬ 
versity of Toronto for his critical reading and constructive criticisms of 
the work in manuscript from which I received valuable suggestions for 
its improvement. 

I am especially grateful to the following authors and publishers for 
their kind permission to reproduce certain tables which are given in the 
Appendix: 

(1) I am indebted to Professor R. A. Fisher and Dr. F. Yates, also 
to Messrs. Oliver and Boyd Ltd., Edinburgh, for permission to reprint 
Tables No. Ill, Distribution of t , and Table No. IV, Distribution of x 2 
from their book, Statistical Tables for Biological , Medical , and Agricultural 
Research; 

(2) Professor George W. Snedecor and The Iowa State College Press 
for permission to reproduce Table 10.7—5% and 1% Points for the Dis¬ 
tribution of F from Statistical Methods (Fourth Edition), 1946; 

(3) Professor Egon Pearson, Editor of Statistical Research Memoirs 
to reproduce Table IV—5% limits for Li and Table V—1 % limits for L x 
computed by P. P. N. Nayer. 


Palmer O. Johnson 
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CHAPTER I 

THE REALM OF STATISTICS 


Statistics in Daily Lif 

Our entrance into and departure from this world are recorded as 
statistical events. Birth and death, marriage and divorce, the school 
attendance of our children, the crops grown by farmers, the number of 
miles flown by commercial planes, the hours of our labor, the output of 
manufacturing plants, the acres of wood demanded for paper, the hours 
of sunshine, the inches of snowfall—all such events and activities are 
recorded somehow and somewhere. Myriads of such experiences and 
events affecting the daily lives of roundly two billion human beings lie 
behind the statistical data condensed in volumes, published and unpub¬ 
lished. In reverse, we are daily translating into their real meaning 
statistical data obtained from newspapers, radio reports, lectures, books, 
and conversations. We act in accordance with the reality implied in 
statistical data when we conserve fuel which is going to be scarce, when 
we ship wheat which will be necessary after a poor harvest in a foreign 
country, when we take precautionary measures against a disease of which 
unusually many cases have been recorded. 

The conception of statistics as having to do with figures is the most 
popular one, and for good reasons. The public is constantly exposed to 
statistical data occurring in advertisements, in arguments, and in the 
distribution of information. If something is said to have been sta¬ 
tistically proved, opposition is supposed to become quiescent. Every¬ 
where the ordinary citizen needs some ability to distinguish between 
what is truth and what is falsehood. In a democracy he needs it most 
where he participates in the settlement of public problems and con¬ 
tributes toward the growth of public opinion. Citizens not only should 
be able to look at controversial questions scientifically and dispassion¬ 
ately; they should also acquire the habit of doing so. Education should 
prepare them to cope intelligently with the problems of their lives and 
times; they must learn not only to think for themselves but likewise to 
act for themselves. There is danger in the educational system of a 
democracy when materials and methods of instruction are not keyed to 
the formation of the scientific attitude and to the development of the 
ability to use the scientific method. The ability to use and scrutinize 
data, to look beneath the surface of things ^nd to discern relations 
between reality and given data, affords an important safeguard against 
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the dangers of omnipresent propaganda. The problem is to educate 
man so that he would rather be guided by fact than by emotion. 

There is a noticeable similarity between arithmetic and statistics with 
respect to use in daily life. Arithmetic is so woven into the fabric of our 
daily life and thought that we use it very often and almost subconsciously. 
With respect to statistics we need only to recall such phrases as “ highly 
exceptional,” “relatively constant,” “increases the probability/ 1 and 
“on the average.” However, arithmetic is a subject taught in all 
elementary schools, whereas statistics is taught scarcely at all, although 
its content is likely no more difficult than that of arithmetic at the same 
educational level. 

The practice of applying certain statistical methods, however simple 
they may be, is a critical social need for all. We must not forget that 
even the specialist lives in general society during at least two-thirds of 
the time. For this longer period he is a layman and needs the best pos¬ 
sible quality of layman's understanding. For his guidance in the current 
of general human living, he needs statistical training. 

The most important and undisputed use of statistics in daily life is 
connected with all the activities of political, social, and commercial 
institutions which determine the economic and cultural life of a nation. 
In the realm of policy it is the function of statistics to measure the impor¬ 
tance of various problems and to place them in a proper perspective. In 
many branches of government factual data already are governing policy 
to a great extent. For instance, the decision to build a number of new 
schools and to engage more teachers implies legislative measures which 
are based on statistical investigations of the school-leaving age, the 
rising birth rate, the increase of population through migration, and other 
factors. Problems in the economic, industrial, and social fields, such as 
increase or decrease of employment, shortage of houses, expansion or 
contraction of existing plants, decrease or increase of crime—these and 
thousands of others should be solved statistically before political action 
can be considered. The whole structure of the national budget depends 
on the sound appraisal of the relationship between potential sources of 
revenue and planned expenditure. Local authorities need statistical 
information for the districts they serve; national agencies need it for the 
country; the organization of the United Nations needs it for the world. 

It is essential that governmental agencies be prepared to make the 
fullest possible use of modern statistical methods. The public is entitled 
to the benefit that may be derived from the progress in research. Old 
methods are often wasteful or have been found unreliable. One should 
expect that government, the foremost user of statistics on a large scale, 
should pioneer in the application of modern statistical methods. 

The urge to apply modern statistical developments seems to be greater 
where an immediate personal advantage is involved in commercial life. 
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There is even one branch of commercial activity which owes its existence 
and all-pervasive development to statistics: insurance. In many other 
branches a combination of technical and statistical knowledge is used. 
The planning of a large factory or combine is now a part of what is known 
as “scientific management.” Many firms have planning departments 
which use statistical returns and charts to a great degree. An unusual 
example is furnished by the seemingly sentimental enterprise of manu¬ 
facturing greeting cards, of which approximately three billion are mailed 
each year in the United States, involving an annual sale of 135 million 
dollars and postal charges of 100 million dollars. “Statistical planning,” 
taking place in a special department one and a half years ahead of the 
exhibition of a card in a store is the first step toward the sale. Every¬ 
where knowledge and experience are needed for planning production, 
distribution, and sales, although the statistical methods used are often 
not very elaborate. In administration, statistics provide measures of 
performance and efficiency. Although the data do not state the causes of 
inefficiency, if any, and do not directly effect improvement, they are 
pointers; their value depends entirely on the use which is made of them. 

Underlying all planning is the guidance derived from statistical data 
of the past toward the goals desired for the future. An insurance com¬ 
pany quoting rates for an endowment life policy to mature twenty years 
hence can and must do so on the basis of an estimate of future interest 
rates and past mortality experiences. The size of a new factory is deter¬ 
mined partly by estimates of future demands for the products to be 
manufactured. Most goods for consumption are made or ordered long 
before they are sold. Consumers, nowadays starting their own organiza¬ 
tions, no less than producers and managers, are dependent, for forceful 
action, on the instruments provided by statistical methods. Thus it is 
profitable to be able to forecast trends for all economic groups: for busi¬ 
ness management comprising large firms with international, long-range 
distributions as well as for the individual merchant supplying the immedi¬ 
ate needs of a local neighborhood. 

On the other side, employees everywhere are finding that it is of vital 
importance to labor and its aggregate organizations to use statistics, 
which represent tools in the formation of their organizations and programs. 

The United States excels in using methods for forecasting trends in 
every field of industry and public life. 

Statistics in the Sciences and the Arts 

Statistical devices have made their greatest advances in the scientific 
and technical branches of industry, where enterprise and science not only 
meet but are amalgamated. 

Perhaps no branch of mathematical scienc</has had a more rapid 
growth than has the science of statistics. In the span of the last sixty 
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or eighty years the methods of statistics and the probability calculus have 
infiltrated one branch of science after another, until they now hold a 
central position in physics, biology, meteorology, chemistry, and 
astronomy. Furthermore, statistics is also growing in significance in a 
number of other fields, such as the political and social sciences. To what 
may this remarkable growth most likely be attributed? 

The introduction of a new theoretical device into a field of knowl¬ 
edge may often seem incidental in that when it first becomes available, 
it is used when it appears to be of value, just as the microscope, X rays, or 
integral equations may be tried out. In the case of statistics, however, its 
introduction was not just casual. 

At first statistics was used apologetically, perhaps with the excuse 
that it was only an expedient to help overcome a temporary shortcoming, 
as in reducing large amounts of observational material in order to study 
details. Thus at first the new “weapon” was tried with the expectation 
that it could be used in the study of detail, as in the study of hereditary 
transmission in individuals from one generation to another, or, as in 
physics, to fill in the gap in knowledge in gas theory with respect to initial 
coordinates and velocities of the single atoms. 

Attitudes in scientific research shift at times, perhaps unintentionally. 
Interest in individuals shifted to the mechanism underlying the behavior 
of aggregates of individuals. It was suddenly realized that even if the 
individual case could be studied in detail, it would be necessary to follow 
up thousands of individual cases in order ultimately to integrate them all 
into one statistical enumeration. 

Charles Darwin was fully appreciative of the essential function of 
statistics in biological study. His theory depended on the law of large 
numbers. Every living species is continually producing a multitude of 
individuals. On the whole, the better-fitted ones live more abundantly 
and have a better chance of survival. The large geometrical progression 
of potential offspring and the enormous destruction of actual offspring 
to be inferred from it constitute the statistical mechanism operating to 
produce the very small increase in the chances of survival that a small 
favorable variation bestows. 

The change in the status of statistics as a subordinate device was 
most drastic in physics. Here it came to take the dominating role of 
defining the goals and showing the ways of reaching them. Thus the 
entire structure of science was shaken, since it rests upon the foundation 
of physics. This role of statistics has led to a new understanding of the 
essential qualities of the laws of nature, namely, the change from a 
deterministic formulation of laws underlying the occurrence of natural 
events to one in terms of statistical regularities, based—as in Darwin’s 
theory—on the law of Wge numbers. This transition from the interpre¬ 
tation of physical laws based on the notion of causality to one derived 
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from statistical theories is attributable largely to Boltzmann’s 1 interpre¬ 
tation of the classical law of entropy, or the second law of thermody¬ 
namics, as it is usually called. According to this interpretation,. the 
second law rests upon statistics. Rather, it is statistics; that is, it is a 
purely statistical law. Heat flows in the direction from higher to lower 
temperature because the chance is only one in many billions that it is 
likely to do otherwise. Events go in the direction in which it is most 
probable that they will move (Ref. 5). 

Further developments, particularly the new quantum mechanics and 
Heisenberg’s uncertainty principle, have revolutionized still more the 
usual conception of the older classical physics and contributed to the 
building of the edifice of the statistical conception of nature. While 
these changes have been taking place, the physicists have developed their 
own statistical methods, particularly quantum statistics, quite apart 
from the methods of statistics in other fields. Statistical ideas are 
utilized in some modern chemical theories, such as the structural formula 
of certain organic substances like rubber and proteins, where chains of 
molecules of different weights and lengths are postulated. For example, 
chemical changes in such substances are interpreted as alterations in the 
frequency distribution of chain length. 

The significance of the general philosophical implication of the statis¬ 
tical formulations relating to the construction of scientific theories can 
hardly be overrated. We are more directly concerned here, however, 
with pointing out briefly the position that statistics holds today in certain 
fields of science and in technology. Since about 1920, the statistical 
approach has been accepted and welcomed by a steadily increasing 
circle of scientific workers, until today this approach is probably one of 
the most characteristic features of modern science. 

The role of statistics in science begins with the interpretation of 
measurements. Even though the methods of the natural sciences are 
the most reliable thus far designed for finding out matters of fact, the 
conclusions drawn from them are only probable, since they are based on 
evidence formally incomplete. This fact is statistically described by the 
attachment of a coefficient of error to the measurement. 

Take, for example, the measurement of the distance of the sun from 
the earth, or, speaking more correctly, the semimajor axis of the earth’s 
orbit. This is the most important constant in astronomy, since it 
establishes the scale not only of the solar system but also of the whole 
universe. It is used in almost any calculation of distances and masses, of 
sizes and densities of planets, of their satellites, and of the stars. There¬ 
fore, any error in its calculation is multiplied and repeated in many 
different forms. Its importance has stimulated^ measurements of ever- 

1 It should be noted that the work of Willard Gibbs followed parallel lines. 
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increasing accuracy. At present the measurement is 93,005,000 ± 9000 
miles (p.e.); 2 that is, the distance is uncertain to 1 part in 10,000. One 
hundred years ago, the uncertainty was 1 part in 20. The progress in 
the development of any science is indirectly given by the size of the errors 
in its measurements. 

Laboratory measurements in physics and chemistry are subject to 
experimental errors. Considerable attention has been given recently to 
methods of controlling and evaluating all variables that might conceiv¬ 
ably influence the results. The purpose is to obtain reliable laboratory 
standards, such as those of capacity, frequency, and voltage. Particu¬ 
larly with the development of the sciences of biochemistry and biophysics, 
measurements are required on material essentially variable. A wide 
field is under development in which are used such statistical methods as 
sampling, followed by analyzing and testing of the experimental results 
as well as the closely related problem of appropriate experimental 
designs. These methods have increasingly important applications in 
industry. The same situation prevails also to a slight extent in engineer¬ 
ing—mostly in technical control and research. Engineers have devel¬ 
oped methods of their own for dealing with the variation in the materials 
which they use. It is likely that the use of statistical methods of treat¬ 
ing variations in these fields would be more efficient than the current 
use of the factors of safety. 

Statistical methods are indispensable tools of the industrialist who is 
concerned with the manufacture or purchase of presumably similar 
articles or units on a large scale. However efficient the control of pro¬ 
duction may be, the products are bound to vary, and it is necessary to 
check the extent of variation by some plan of routine testing. The 
conformity to the requirements of a consignment of raw or manufactured 
materials must be reliably established. Considerable headway has been 
made in recent years in developing efficient statistical methods and 
experimental designs for meeting requirements. The productive process 
must be in a state known as one of statistical control, the criterion for 
which is: the sequence of materials must exhibit the property of random¬ 
ness. These are statistical problems, for the solution of which the most 
advanced statistical methods are necessary. At times, when operations 
were found lacking statistical control, statistical analysis of the results of 
routine tests have been used successfully to locate the source of the 
unwanted variations. The application of statistical methods can protect 
the consumer against the vagaries of sampling and safeguard the producer 
from the losses incurred by chances “unjust” to him. 

Meteorology is a branch of applied physical science which has a 
statistical basis, since feather forecasting utilizes statistical principles 


* Probable error. 
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and methods. The meteorologist collects data which are relatively 
complex and which are the result of multiple factors operating together 
without control. Hence, he has to apply methods of multivariate 
analysis and also statistical methods developed for dealing with serially 
correlated data. It may be expected that, with the rapid development 
of electronic calculators, striking improvement will be made in solving 
the problem of long-range weather forecasting. The great problem in 
weather forecasting at present is the lack of means to work out all the 
mathematical variables within the period that knowledge of this kind is 
useful. If valid predictions could be made of the weather long enough 
in advance, it might even become possible to do something about the 
weather. Agriculture, shipping, air travel, and other activities would 
benefit by advanced knowledge of the weather. The savings in lives, 
crops, and money would be incalculable. 

In practically all branches of biology, methods of statistics are used. 
Galton, influenced largely by the ideas of Darwin, made quantitative 
studies of biological variation. Much of the recent development in the 
theory and application of statistics arose to meet the need for improved 
tools designed to handle problems in agricultural and biological research. 
There was a need in these fields not only for interpreting observational 
data but also for planning experiments efficiently. 

Genetics is a branch of biological science which seeks to explain the 
resemblances and the differences that are displayed among organisms 
related by descent. Whereas the earlier work in this field was chiefly 
descriptive and empirical, the development of theories based on Mendel's 
discoveries has brought statistical methods to bear more and more on 
the problems. In fact, highly developed statistical methods now con¬ 
stitute the basis of an important part of the subject. The once conflicting 
sciences of biometry and genetics are now closely integrated. 

Public health, epidemiology, and vital records are statistical in char¬ 
acter. The collection and analysis of large masses of data are funda¬ 
mental in those fields. Federal and state governments collect data for 
informative and directive purposes. The study of population changes 
is somewhat specialized; its facts are the facts of life on which scientific 
planning for the future depends. Populations are recruited by birth 
and depleted by death. 3 The balance between them and the change in 
character of the age-group patterns of the population are subjects requir¬ 
ing careful and critical statistical analysis. Statistical methods are 
increasing in use in research in many branches of medicine, though 
apparently the general practitioner has not been greatly affected by 
statistical ideas. Statistical methods are also fundamental in the 
standardization of biological extracts. In biological assays, such as in 


8 Immigration has, of course, been an important factor in the United States. 
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the calculation of the potency of penicillin, insulin, digitalis, and other 
drugs, the necessary precision could not be realized except by the use of 
modern statistical procedures. 

The psychologist, particularly in the fields of experimental and applied 
psychology, needs a working knowledge of statistical methods. In a 
quantitative inquiry into a psychological problem it is generally necessary 
to measure a limited number of cases. In selecting them the psychologist 
must be sure that they are effectively representative of the population 
from which they are drawn. Usually at least two samples, namely, 
experimental and control groups, are necessary in an experiment. These 
must be so selected as to eliminate any bias of selection with respect to 
characteristics that are related to the investigation. In addition, the 
problems of measurement involve the determination of the reliability 
and validity of the instruments used. Finally, analyzing the experi¬ 
mental data and drawing conclusions that the data merit are essentially- 
statistical procedures. 

Applied psychology emphasizes the importance of individual differ¬ 
ences; it needs to develop tests for intelligence, skills, and aptitudes of 
various kinds. The allocation of individuals to places in society for 
which they are best fitted requires tests of mental and physical traits. 
The statistical methods of multivariate analysis are essential for the 
interpretation and use of such data. The future of human civilization 
depends to a great extent on the capacity of* man to understand the factors 
and forces governing or controlling his own behavior. In the solution of 
these problems statistical method is likely to play a significant role. 

Psychologists have developed from orthodox statistical methods some 
variants of their own. The methods of factor analysis, for example, are 
used to describe the human mind by means of a small number of psycho¬ 
logical factors. 

One of the earliest uses of the term statistics was the description, at 
first verbal and later numerical, of outstanding characteristics of a state. 
The interpretation given in the first issue of the Journal of the Royal 
Statistical Society (Ref. 8) is: “Statistics may be said . . . to be the 
ascertaining and bringing together of those facts which are calculated to 
illustrate the condition and prospects of society.” Social science was 
the parent of statistical method. A characteristic of the method of the 
social scientist was the restriction of his observations to circumstances 
that were not amenable to experimentation. Hence he usually dealt 
with complex cases of multiple causation. The science of economics is 
perhaps the best example of this use of statistics. 

Tippett (Ref. 6) gives three reasons why economics is dependent on 
statistics. One reason is that economic laws, if they exist, pertain to 
mass or group phenomena. The preferences, desires, and reactions of 
millions of people are manifested in economic events. The so-called 
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“law of supply and demand” applies very widely. The fundamental 
assumption underlying the existence of sciences like economics (and 
psychology) is that statistical laws are descriptive of human behavior. 
A similar assumption underlies a rational approach to business and 
political problems. The second reason for the dependence of economic, 
science on statistics is that only quantitative data, that is, statistics, can 
yield laws in the scientific sense. The third reason lies in the nature of 
economic problems. Economic experiments are usually not feasible. 
Hence, if phenomena are to be observed and explained, the method of 
study is essentially statistical rather than experimental. It is not often 
possible to isolate one or a few factors for experimental study as is done 
by the experimentalist in his laboratory. 

In economics research there are three general uses of statistics: they 
may (1) serve as information culminating in hypotheses and theories, (2) 
be applied to the testing of hypotheses or theories, and (3) furnish esti¬ 
mates of quantities in economic analysis. 

There has not been much cooperation between theoretical economists 
and statisticians. However, the development of statistical methods 
has been notable in economics. The increasing use of such quantitative 
concepts as prices, income, and supply and demand may mean that the 
approach of the statistician eventually will prevail over that of the 
theorist. 

We meet specific problems to which statistical analysis has been 
applied in telegraph and telephone communication, in electric-power 
distribution, in road and rail traffic, and so on. The theory of probabil¬ 
ity has been usefully applied in the study of the effects of chance and 
other factors in accidents. It has been noted that individuals differ in 
their proneness to suffer accidents under given conditions. J 

Statistical facts and methods play a significant part in the develop¬ 
ment of sociology and education as sciences. The collection of statistics 
illustrative of the conditions of society has been mentioned as one of the 
earliest activities. Each national census depicts our industrial, economic, 
and social status at a given time. Social surveys are frequently con¬ 
ducted in different parts of the country to find out the status of unem¬ 
ployment, housing, the delinquency of youth, and so on. The method 
of inquiry may be by sample, with its own special difficulties and sources 
of error. Sociology stresses the interdependence of social facts and the 
need of considering them in relation to each other. The comparative 
method used at times applies the principle of varying the circumstances 
of a phenomenon with a view to eliminating variable and unessential 
factors. Thus it aims to arrive at what is indispensable and constant. 
Its primary purpose is to make provision for classification of forms of 
social relationships to facilitate causal analysis". Statistical investiga¬ 
tions of crime, of the causes of suicide, and of the conditions under which 
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certain economic organizations arise illustrate how the comparative 
method has been applied.^ 

Educational statistics collected by Federal, state, and local authorities 
provide more and more the basis for educational policies and programs. 
Subjects illustrative of the amenability of educational problems to 
scientific study afe: changes in the school population with respect to 
age, intelligence, and other characteristics; means of providing equality 
of educational opportunities; the location of youth with special talents. 
Likewise, numerous studies employing the experimental method, par¬ 
ticularly those applying modern principles of experimental design, are 
adding genuine knowledge concerning the educational process. 

National opinion polls, such as those of Gallup, Crossley, and Fortune 
magazine, use systematic methods of sampling. The development of this 
means of measuring public opinion is likely to play a significant part in 
the theory and practice of democratic government. 

Statistics is beginning to find application even in such nonscientific 
fields as the arts. In a task comparable to that of the telephone engineer 
who tabulated the frequency of principal words in order to secure the 
best possible transmission, a literary scholar has tabulated the six 
thousand most common words in English, French, German, and Spanish. 
Some points of disputed authorship have been decided by the statistical 
study of the length of sentences. The frequency with which colors and 
sound patterns occur in poetry, the number of types of imagery used by 
Shakespeare, the number of different word classes characteristic of prose 
and poetry of certain periods—all these are illustrations of statistical 
applications. Evidence of errors in the chronology of early Roman 
history has been revealed by certain life tables. The authenticity of 
paintings has been established by means of the frequency of brush marks. 

The work of the mathematical statistician is fundamental in the 
development of statistical science. Here, as in other fields of science, 
basic research contributes general knowledge which affords the means 
of solving a large number of significant practical problems, although a 
specific solution may not be provided to any one problem. The role of 
applied research is to discover complete solutions to specific problems. 
The new knowledge provided by basic research furnishes scientific 
capital, from which source practical applications must be obtained. 
Most of the mathematical theory of statistics in its present character is 
the result of research of recent decades. Perhaps in no field of science 
have the theoretical advances been so sweeping and the practical results 
of such advances so pronounced. The reason may be that the solution 
of theoretical problems was primarily rendered indispensable by the 
urgent requirements o^practical research. Furthermore, the principal 
contributors to the solution of the theoretical problems discovered the 
actual need for such solutions in their direct contact with the problems of 
practical research. 
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There is usually a gap between theoretical developments and practice 
in scientific fields, and this gap is also characteristic of statistics. The 
width of this gap varies in the several applied fields, and there is even a 
wide variation among workers within the same field with respect to the 
quality of statistical methods used, .j 

The rapid development of statistical science has, of course, left many 
problems unsolved, both theoretical and practical. It may be expected 
that theoretical studies of statistics will increase in the immediate future, 
leading to greater rigor of its theoretical structure. As is characteristic 
of all scientific subjects, statistical science is never finished and complete: 
it is dynamic, developing always. The result will be more and more 
rigorous methods (Ref. 2). This development is likely also to take in 
areas and fields where new types of observational data and new kinds 
of observations will be sought. Also, supplementary mathematical 
researches will be found necessary before workers in such fields can carry 
out their studies with the high standards of competence employed in 
fields where statistical methods are firmly rooted. 

Mention should be made of the mechanization of statistical calcula¬ 
tions. The generation of calculators as users of logarithms and prepared 
tables of mathematical functions and other aids has been succeeded by 
one which knows only how to produce figures mechanically. Commercial 
machines for accounting and for scientific computation have done much 
to benefit business, government, and science. It is not only in removing 
the drudgery of reducing large masses of statistical data that the mecha¬ 
nization of statistics is important: with the development of machines 
based upon the principles of electronics rather than of the cogwheel, the 
most complicated and advanced mathematical applications become prac¬ 
tically solvable for the first time. The significance of this development 
for the solution of theoretical as well as practical problems in science is 
just beginning to be realized. The impetus given to this development 
by the exigencies of World War II was very great. No matter how 
rapidly one machine is produced, when finished it seems to be almost 
obsolete, so swift is progress. Therefore, any description of the electronic 
calculator which is given here is likely to be soon superseded. 

The electronic numerical integrator and computor, the Eniac, 
invented and perfected at the Moore School of Engineering of the Uni¬ 
versity of Pennsylvania, does not have a single moving mechanical part. 
Only the tiniest elements of matter—electrons—move within its 18,000 4 
vacuum tubes and several miles of wiring. This amazing machine com¬ 
pletes in two hours a mathematical task which 100 trained men could do 
only in a year. Since all mathematical tasks, however abstruse or 
involved, can be reduced to basic arithmetic if ample time is provided, 
this machine practically eliminates time to give the answers to virtually 
any problem. That is, basically the machine does nothing more than 
perform the fundamental* arithmetic processes. This it does by the 
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generation of very precisely timed electrical impulses. These impulses 
are formed at a speed of 100,000 per second, which is equivalent to one 
operation every twentieth impulse, thus adding, for instance, at the rate 
of 5,000 per second^/The Eniac has four kinds of memory. One of 
these “minds” performs the task of indicating the initial and boundary 
conditions of the problem. All problems must first be broken down into 
their essentials, which are then punched on cards. These cards are then 
run through a machine unit known as the “reader.” The reader acts 
as the translator of the mathematical language to the language of the 
machine, and vice versa. The values of certain scientific constants are 
introduced when required. The machine can handle numbers of 20 
digits. 

Machines have already been planned to solve problems running into 
400 stages, that is, machines which have a “ memory ” of 400 numbers. 
Such a machine could solve 100,000 different equations in approximately 
one minute. 4 

The illustrative rather than exhaustive review that has just been 
presented has attempted to portray the realm of statistics all the way 
from daily life through theoretical and applied science. If the purpose 
has been achieved, the all-pervasive character of statistics should be 
realized. A knowledge of statistics—at least of its logic and its depend¬ 
ence on the data of experience—is indispensable to everyone in the 
practical affairs of human society. Statistical science has likewise per¬ 
vaded both the theoretical and applied aspects of the biological, physical, 
and social sciences. In fact, every observable event in the behavior of 
man, as well as in the behavior of rocks and stars, is amenable to scientific 
treatment and correlation with other events. In this analysis, statistical 
methods have come to play a necessary part if such data are to be assayed 
with scientific precision and if the reliability of the information is to be 
determined with objective validity. 

The bricks of experience and the mortar of reason are the twin sup¬ 
ports upon which the indestructible foundation of science is built. The 
essence of science is the rational ordering of the facts of experience. In 
this process the data of experience are represented by concepts. The 
concepts are defined in a manner which facilitates the interpretation of 
rational relation between experiences. Although the derivation of these 
relations involves pure reasoning, statistical methods based on the theory 
of probability contribute in the drawing of inference and conclusions by 
specifying the degree of uncertainty involved. 

Statistics in all its aspects is accordingly of interest and importance 
to a large number of classes of people. However, there are few if any 
individuals, including professional statisticians, who can be experts in 


4 See the discussion of meterology, page 7. 
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all branches of statistics, because they would then need to be expert in 
many branches of knowledge, including the foundations of statistical 
science itself as well as the many fields of application. Statistics is both 
a science and an art. Statistics is a science because its methods are 
basically systematic and of wide application. Statistics is an art because 
success in its application is dependent on the skill, special experience, and 
knowledge of the person using it in the field to which the application of 
statistical methods is made. Such qualifications are necessary because 
the data collected in any field are the manifestations of persons or things 
with which the statistician needs a first-hand acquaintance. 

It is, therefore, of importance that the author of any text in statistical 
methods make clear the purpose and scope of his book. 

Statistics in This Book 

The traditional and popular notion of the function of a statistician 
is the collecting, tabulating, and describing of long records of figures. 
These records are conveniently summarized by the calculation of aver¬ 
ages, percentages, index numbers, and other descriptive measures, and 
by the cpnstruction of one or more of the kinds of tables, graphs, dia¬ 
grams, or charts. This process of reducing data to certain summary 
values has been greatly aided by high-speed machines for tabulation and 
calculation. The collection of experimental and other observational 
data is, of course, an indispensable part of the scientist’s work. How¬ 
ever, the function of a statistician, as now recognized in many branches 
of science, goes far beyond the collection and processing of numerical 
data for descriptive purposes. The less widely known activities include 
his contributions to advances in mathematical statistics basic to the 
creation of tools of scientific value. These tools give precision to tests ot 
scientific hypothesis. They also indicate how observational studies 
including experiments must be planned, whether under laboratory, 
factory, or field conditions, to provide the most reliable and valid infor¬ 
mation with the least expenditure of time, energy, and money. 

The emphasis of this book is on the interpretative rather than on the 
descriptive function of statistics. This book also aims to present the 
theoretical foundation of modern statistics, not as an end in itself but 
principally to provide the background for the intelligent application of 
modern statistical methods. The medium for developing an under¬ 
standing of the theoretical foundation is primarily empirical and logical, 
supplemented at times by mathematical formulation. The complete 
exposition of the mathematical theory of modern statistical methods is, 
however, beyond the scope of this volume. Such information would be 
of interest chiefly to mathematical statisticians, since a thorough under¬ 
standing of the theory of modern statistical methods requires a fairly 
advanced knowledge of pure mathematics. Until recently, the basic 
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researches in the mathematical theory of statistics were rather widely 
dispersed among scientific journals, but books dealing principally with 
the mathematical theoretical foundation of modern statistics are now 
available (Refs. 1, 3, 4, and 7). Thus, although from the mathematical 
standpoint this book is not self-contained, it is written for readers without 
specialized mathematical training. 

The theoretical presentation in this book has been based, as it must 
be for present-day needs, on original and secondary sources of mathe¬ 
matical statistics. It is assumed that certain aspects of this theoretical 
background must be clearly understood if statistical methods are to be 
put to intelligent use. One basic conception is that one must know how 
to choose the most effective statistical tool for the purpose in mind. A 
second is that one must know the basic assumptions underlying the 
statistical tool selected. A third basic conception is that one must first 
test to see if the assumptions are fulfilled by the particular situation -to 
which the tool is to be applied. By continuous emphasis in this text 
upon these requisites, it is proposed that the user of statistical methods 
will become habituated to the practice of critical examination and selec¬ 
tion rather than to applying statistical methods blindly or in a rule-of- 
thumb manner. 

Let us repeat: statistical method is based on the same fundamental 
ideas and processes as is the general scientific method. Thinking 
statistically is equivalent to thinking scientifically. This kinship under¬ 
lies the development of the principles of statistical methodology. The 
more complete understanding of scientific methods is a direct aim in the 
presentation of this text. Reasoning skepticism, scientific caution, and 
common sense are urgently needed in statistics. 

The more significant contributions to statistics since the early 1920’s 
have been made in the development of the foundations for the problem of 
statistical inference. The principles of statistical inference deal with 
two chief problems: that of testing statistical hypotheses and that of 
statistical estimation. These, then, are the two fundamental statistical 
problems of the research worker. The presentation of the theoretical 
aspects of these two problems, with special emphasis on their practical 
aspects, constitutes the principal content of this book, which has been 
arranged with the view of presenting the main ideas underlying statistical 
inference in a logical developmental order leading to a functional under¬ 
standing of the principles. 

The concepts underlying probability and likelihood as they are used 
in statistics are given first, since probability theory plays the primary 
role in statistical inference. The fundamental theorems of direct prob¬ 
ability follow. We proceed with other theorems which, in turn, lead to 
the classical binomial, normal, and Poisson distributions. 

We then discuss the development of sampling theory and its use in 
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problems of statistical inference. The selection of representative samples 
receives considerable emphasis in keeping with the requirements of 
present-day research. 

This background should prepare the student to understand the testing 
of statistical hypotheses. Many illustrations of current procedures of 
testing statistical hypotheses are presented. The problem of estimating 
parameters from sample values is then treated. The following are 
considered: the properties of “best” estimates; the form of the frequency 
distribution of observational values in relation to the most accurate 
estimates; and two methods of forming estimates—the method of maxi¬ 
mum likelihood and the method of interval estimation. Many original 
practical problems are worked out by way of illustration. 

The interpretative function in statistical analysis has been mentioned 
as one of major concern. The fact is, however, that the interpretation 
of a body of data requires a knowledge of how it was obtained. It is of 
equal importance that conclusions drawn from observational results be 
based on detailed knowledge of the procedures employed in the investiga¬ 
tion. Thus, the major function of a statistician is to design experiments 
and to plan investigations which will yield maximum information and 
valid conclusions. This responsibility of a statistician is stressed 
throughout. 

Considerable space has been allotted to the technique of the analysis 
of variance, the most powerful statistical tool yet devised for analyzing 
sources of variation. Modern experimental and sampling designs require 
this technique for the analysis of their results. Related problems such 
as those in regression are also included. 

A thorough understanding of the problems of the field in which one 
works is essential when statistical data from this field are to be collected 
and analyzed. To develop statistical craftsmanship, one must acquire 
skill by observation and much practice. The aim of this book is to assist 
students and research workers who require technical aid in the design, 
execution, and interpretation of quantitative researches which may 
originate in the laboratory or in the field. This book is designed just as 
much to help a student to become a competent critic of the research 
literature in his field. 

The content of this text is based largely on the theory and application 
of those statistical methods which are of general importance. The same 
formula is applicable to diverse groups of subject matter, as is true of 
other branches of mathematics. The specialized uses of statistics 
involve no great alteration of structure; rather, the specialization consists 
in the way in which statistics is applied. No attempt has been made in 
this book to present illustrations from the many varied fields to which 
statistical methods can be usefully applied. The student should become 
competent to deal with many analogous problems through a study of the 
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statistical processes illustrated in the examples. He is, therefore, invited 
to work through the numerous examples in all numerical detail, so that 
he may learn how to apply the same methods not only to the unsolved 
problems given in the text but also to those encountered in his readings 
and, above all, in his own research. 

Much care has been given to the practical arrangements of numerical 
calculations. The analysis of the results obtained from modern and 
original experimental designs has been given special attention. 
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CHAPTER II 


PROBABILITY AND LIKELIHOOD 

The work of a scientist is in part practical: he designs experiments 
and makes observations. Another part of his labor is theoretical: he 
formulates conclusions from his experimental findings, compares his 
results with those of other workers, constructs a theoretical system so as 
to represent and order the facts of observation as accurately as possible, 
or notes their conformation to existing theory. With the aid of the theory 
he derives predictions, which he again validates by new observations.,/ 

The Basis of Statistical Inference. In most, if not all, of these 
activities of the modern scientific worker, statistical methods play a 
significant part. If the experiment or investigation is to lead to explicit, 
unequivocal, and convincing results, it must be planned so that the data 
are capable of clear-cut statistical interpretation. The testing of under¬ 
lying assumptions, the drawing of inferences from sample to population 
or from observation to hypothesis, and the derivation of predictions are 
all based upon intelligent statistical analysis. 

One of the most hazardous acts of the research worker is the drawing 
of inferences or conclusions from experimental data. This act is a proc¬ 
ess of reasoning from the part to the whole, from sample to population, 
from the particular to the general, or from effect t o cause. This step 
is difficult, it seems, because the experimental results pertain to the experi¬ 
ment or sample, whereas the inference or conclusion refers to the popu¬ 
lation, of which the experiment or sample is only a very small part. 

The inferences drawn from sample to population are uncertain. 
Even so, these inferences can be rigorous, because they may be made so 
as to include within themselves a quantitative specification of the kind 
and amount of uncertainty involved. Upon this achievement depends 
the validity of the process of acquiring new knowledge by observation or 
experiment. Science can progress by collecting new experiences as well 
as by the better ordering of those already possessed. It is primarily by 
the former process that new knowledge comes into being. 

The statistician’s contribution to the problem of drawing conclusions 
from experimental results consists in (a) setting up the requirements for 
the design or the logical structure of the experiment and (b) interpreting 
the data. While these two aspects of the process of adding to scientific 
knowledge are closely related, our principal concern for the present is to 
consider the general problem of statistical inference. As has been noted 
in Chapter I, there are fwo chief problems of statistical inference: that 
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of testing statistical hypotheses and that of estimation. Preliminary 
to the direct consideration of these problems, it is desirable to develop 
some fundamental ideas and theorems, which have their origin in prob¬ 
ability theory. The interpretation of experimental data is based on the 
application of probability theory. This theory is planned to provide the 
mathematical model of the empirical facts, that is, the data with which 
the statistician works. 

Setting up a Model. In looking for a solid theoretical foundation 
upon which to build a model, the statistician must make clear just how 
far the concepts which he uses are justified and are requisite. The 
justification of the logical system he develops rests upon the demonstra¬ 
tion of its usefulness in describing the results of experience. The events 
and objects of the world of reality are always very complex. The 
scientifically trained mind is required to identify the characteristic or 
salient point from among the vast number present as an essential condi¬ 
tion from the standpoint of theory. Because the objects of the world 
of reality cannot be comprehended in a way that could lead to an exact 
theory, they are superseded by idealized conceptions which can be com¬ 
prehended with comparative ease. The object of creating theoretical 
models is to permit the mental reconstruction of the world of empirical 
fact. This statement is not equivalent to saying that the theory necessi¬ 
tates putting the empirical facts into an inflexible predetermined scheme. 
On the contrary, the theoretical system must be constructed so that the 
facts are truthfully represented. A scientific theory may be abstract 
not only in that it encloses a collection of selective facts but also in that it 
covers a set of ideal objects, such as wave function in physics and the 
plane in geometry. Yet when such theories encompass real objects 
to close approximations, they may serve a useful purpose. The statis¬ 
tician begins his work in developing efficient working tools for the research 
worker by building a simplified model by which he proposes to represent 
the phenomena of observation with reliability sufficient to supply useful 
results. 

Statistical Interpretation of Probability. The principal function of 
statistics is to describe certain characteristics of mass phenomena 
and repetitive events. From the theoretical point of view, unlimited 
sequences of events or of similar observations are referred to as statistical 
universes or populations or collectives . Much of theoretical statistics is 
built up around the idea of an infinitely large hypothetical population of 
which the observational data make up a sample. The idea of an infinite 
parent population from which samples are taken is a mathematical 
abstraction. Populations with which we deal in practice are finite. The 
infinite population may be considered as a limiting case of a finite popu¬ 
lation when the number of individuals increases indefinitely. In experi¬ 
mental work, also, a hypothetical infinite population may be considered 
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as an infinite population of all experiments that might have been carried 
out under the conditions of an observed experiment. The individual 
experiment is interpreted as a random selection from the infinite popula¬ 
tion so defined. 

A population is an aggregate of individuals. The individual case 
is of interest to the statistician chiefly because it is from the collection of 
individuals that the characterization of the population becomes possible. 
Even if the interest were in the individual case, information would need 
to be collected for thousands of individuals, and perhaps no other eventual 
use of them would be made besides combining them under a single sta¬ 
tistical generalization. Although in this treatment the identity of any 
particular individual is irretrievably lost in the aggregate, it does not 
follow that we cannot say anything about the individual from the 
knowledge we have of the population. Take, for instance, the frequency 
distribution of the ages of the 24,395 high-school graduates as recorded 
in Table 45, page 202. Let us take a single individual from the group or 
population of 24,395. Even though we do not know his age, we know 
that he will be an exceptional individual with respect to age if he is 
of less than, say, sixteen years. It can be said that he will be one of 
84/24,395ths of the group. He will, of course, more likely be one of the 
12,148/24,395ths of the group. It is more convenient, when dealing with 
problems of this type, to use a term commonly called odds or probability. 
In the illustration just cited, it can be said that the odds or probability of 
any one individual's being less than sixteen years at the date of graduation 
from high school is 84/24,395 = .0034, and the probability of his age being 
eighteen is 12,148/24,395 = .498. This interpretation of probability 
is the one usually accepted in modern statistics; that is, probability is the 
ratio of frequencies. As in this illustration, so in any frequency distri¬ 
bution: statistical probability may be considered as the means by which 
the characteristics of the whole distribution may be ascribed to the 
random individual. 

The long-standing controversy over the nature and meaning of 
probability need not detain us here. We may merely mention that the 
psychological and subjective interpretation should be kept distinct from 
the objective or operational interpretation of relative frequencies. 
Probability is associated with our subjective sense of expectancy just as 
a thermometer reading is linked with our subjective sense of heat and 
cold. The evaluation of probabilities from given data on the basis of 
standard calculations of secondary from primary probabilities is objective 
in the sense that this manner of derivation is acceptable to most modern 
statisticians. 

Two definitions of probability may be cited here. (1) Von Mises 
(Ref. 3) defines the probability of an event as the limit of the relative 
frequency of this event in an infinite sequence of trials, the Kollektiv , 
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fulfilling certain specified conditions. In a purely mathematical sense, 
the existence of this limit is assumed to be axiomatic. (2) Kolmogoroff 
(Ref. 2) gives the most comprehensive discussion of probability from the 
standpoint of measure. He defines probability as a set function which 
fulfills a certain system of axioms. This theory starts with the concept 
of the frequency ratio but does not postulate that definite limits of 
frequency ratios exist. It builds around the concept of a random vari¬ 
able, that is, by considering the probability of an event as a number 
connected with the event. The axioms postulated in the theory express 
the principles for operating with the numbers. With respect to applica¬ 
tion, the two theories are largely equivalent. However, the limiting 
properties of frequencies involved in definition (1), rather than the pure 
mathematics of abstract ensembles occurring in definition (2), will be 
accepted as the basis of the frequency theory of probability insofar as it 
is used in our present discussion. 

Thus, the true probability, Pn, of getting a double 6, or sum 12, in 
one throw of two dice is defined as lim — > assuming that the limit exists, 

n—+ * 71 

where tin is the number of times a score of 12 is obtained in n throws 
of the two dice. Similarly, probability values can be determined for 
each of the other possible totals. On a priori grounds, a tentative or 
hypothetical probability could be assigned to the true probability. 
However, probability in the sense used here in statistics depends for its 
meaning on aggregates of phenomena or repeated events. Although 
the value of Pu, for instance, can never be reached in practice, it can be 
attained within an arbitrary degree of certainty by making n sufficiently 
large. According to a theorem by James Bernoulli, the probability that 

the relative frequency — will be adjacent to Pn is arbitrarily near to 1 

71 

for a sufficiently long sequence of trials. 

Example 1 . An Experiment in Probability. We shall illustrate 
some of the main points in probability theory by considering an experi¬ 
ment consisting of the throws of a pair of dice. This experiment was 
repeated a large number of times. The sequence of throws of the pair 
of dice gives rise to a sequence of numbers, the variable consisting of the 
sums of the several combinations of the two sets of dots on the two upper 
faces of the dice after each throw, that is, 2, ... , 12. The conditions 
of each throw were kept as uniform as possible. The systematic record 
of the results of sequences of this kind constitutes a set of statistical data 
relative to the events observed. Six sets of data, resulting from 36, 360, 
3,600, 36,000, 180,000, and 360,000 throws, are recorded in Table 1. 
The data are arranged in frequency distributions which show the number 
and per cent of occurrences for each of eleven possible events, 2, . . . , 12. 
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The hypothetical or theoretical distribution arrived at on a priori grounds 
is also recorded in the first main column. 

The probability of getting a 12 in the throw of two dice shows some 
fluctuation in the six series of experiments, ranging in value from .028 to 
.033. A similar situation holds for each of the other totals. The true 

721 o 

probability of getting a 12, lim —> though never reached in practice, can 

n—* oo W 

be approached closer and closer by increasing the size of n. On this 
basis, the value .029 determined by 360,000 throws would give the best 
approximation; likewise for the other totals. In this way probability 
statements are based on the results of empirical investigations. 

The erratic or haphazard behavior of the fluctuations of the variable 
from throw to throw is usually spoken of as randomness. Even with the 
utmost care in keeping all relevant factors under control, the results 
vary from observation to observation in such an irregular way that exact 
prediction of any single event is impossible. The sequence may, there¬ 
fore, be called a sequence of random experiments. It is noted, however, 
that, in spite of the unpredictable behavior of individual results, the 
average results of long sequences of the random experiments exhibit a 
striking regularity; this regularity may be inferred from the similarities 
among the several percentage frequency distributions. It is this phe¬ 
nomenon that serves as the basis for the mathematical theory of statistics. 

The hypothetical value of probabilities may at times be very useful in 
furnishing clues to true probabilities. 

We may use the theoretical values of P to determine the mathematical 
expectation , a concept that will be encountered later in sampling theory. 
The mathematical expectation of any quantity is the sum of all the values 
it may assume multiplied by their respective probabilities: 

E(X) = P 1 X l + P 2 X 2 + • • • + PnX n = 2 PiX « (2.01) 

Formula (2.01) shows that the mathematical expectation is the weighted 
arithmetic mean of a variable where the different probability values, Pi s, 
provide the weights. The mathematical expectation of the throws of 
two dice is given by: 

E(X) = (t5-)2 + 0&)3 + (A)4 + (A)5 + 

WV)6 + (A)7 + G&-)8 + (-gV)9 + • (2.02) 

O&)10 + (A)ll + Wfr)12 = 7 

Summary . In the interpretation of probability statements on the 
basis of relative frequencies, the following points are essential (Ref. 4): 

(1) The probability of an event has meaning only when the individual 
event is an element of the specified reference class. 
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(2) The objective values involved in probability statements grow 
out of their determination through empirical investigations. 

(3) Since probability relates to the property of an object in a specified 
reference class, a given property can be associated with various 
degrees of probability referred to different reference classes. 

(4) The direct evidence for probability statements is statistical in 
character, since the definition of such statements is explicitly 
stated in terms of relative frequencies. There are, however, 
cases where indirect evidence provides estimates of the probabili¬ 
ties and validation of them: for example, when probability state¬ 
ments are a part of a system of statements. 

(5) Every probability statement defined as the limit of a relative 
frequency is a hypothesis which is incapable of complete confirma¬ 
tion or final verification by means of the finite evidence available 
at any specified time. 

(6) Probability statements to be used successfully for specifying the 
occurrence of designated properties in definite classes with stable 
relative frequencies are not dependent upon “deterministic” or 
“indeterministic” issues. 

Fundamental Theorems of Direct Probability. The function of the 
calculus of probability is to derive probabilities of compound events from 
sets of initially given probabilities*/Thus, in the example of dice casting, 
given above, the probability of throwing a 6 with a die is not a problem 
in the calculus of probability; but, given this probability, the probability 
of getting 12 in the throwing of two dice is such a problem. It should be 
recognized that the propositions asserted in the calculus are only analytic 
of the definitions and rules originally specified, as in the case of demonstra¬ 
tive geometry, for instance. The probability calculus thus makes 
possible the derivation of relative frequencies with which certain events 
occur from the initial probability statements without the specification in 
the statements of what the actual frequencies are. In thus making 
definite the predictions which the probability statements involve, the 
calculus enables us to make the check of statement content. In this 
section a few of the standard rules regulating the calculus of direct 
probabilities will be given. Most of the science of statistics is built upon 
the explicit or implicit application of these fundamental rules (Refs. 1 
and 4). 

It is assumed, to begin with, that probability is measurable on a 
continuous scale. Thus, a probability is a real number, and any two 
measures of probabilities are comparable, that is, Pi > P 2 , Pi = P 2 , or 

Pi < P 2 . ' 

The probability of a proposition A on data R is written 

• P\A\R} v 
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Thus, we may state as 

Rule 1. If R entails A , P{A\R) = 1 

If R entails not— A , P(A|/2) = 0 

Thus, if an event is certain to happen, its probability is 1; if it is 
certain not to happen, its probability is 0. The range on the probability 
scale is from 0 to 1. Any value between these limits is, therefore, a 
positive proper fraction. 

Rule 2 . If P i, P 2 , . . . , P n are the probabilities of n mutually 
exclusive propositions A h A 2 , . . . , A n on data R, then the probability 
that one of the propositions is true is Pi + P 2 • • • + Pn. Symbolically: 

P{Ai or A 2j or • • • A n \R) ^ P 1 {A 1 \R] +P 2 {A 2 |P} 

+ • • • +Pn{A n \R] 

Thus, if one ball be drawn from a bag containing four white, five 
black, and seven red balls, since the chance of its being white is i and of 
its being black is iV, the probability of its being either white or black 

is A- 

Rule 8. The probability of two propositions A and B on data R is 
the product of the probability of A given R and that of B given A and R. 
Symbolically, 

P{AB\R} = P(A\R)P(B\AR ) 

More generally, 

P(A,A 2 ■ ■ ■ A*|P) = P(A»|JR)P(A,|A,B)P(A«|AiA.fi) • • • 

P(A*|A*_ t • • • AiR) 

Thus, the probability of drawing a second white ball from a bag contain¬ 
ing five white and four black balls, the ball first drawn being returned 
before the second drawing, is $ X $r, or fy. 

The probability of becoming a total orphan is the product of the 
probabilities of being bereaved of father and of mother. 

The rules for the logical sum of events (Rule 2) and for the logical 
product (Rule 3) are basic in the elementary calculus of probability. 
From them, by the application of the ordinary rules of logic and arith¬ 
metic, it becomes possible to derive significant consequences. One such 
derivation is Bayes’s theorem, which, from the consequences drawn from 
it, often plays a conspicuous part in treatments of the foundations of 
probability and scientific method. Symbolically, it may be stated as 

P{Ai\RH\ « P(A i |H)P(P|A < H) 

That is, the probability of A<, given R and H, is proportional to the prob¬ 
ability of Ai, given H, multiplied by the probability of R, given Ai and H. 
The factor on the left, that is, P[Ai\RH\, is called the posterior probabil¬ 
ity; the first factor on the right, P(Ai\H), the prior probability; and the 
remaining factor, P(R\AiH), the likelihood. • 
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In order to make any practical use of Bayes's theorem, it is necessary 
to decide on the values to be ascribed to the prior probabilities. Bayes 
and Laplace postulated that, in the absence of definite knowledge, the 
antecedent probabilities were assumed to be equal. This postulate has 
been relentlessly attacked, especially in recent years, by statisticians on 
the grounds of supplying by hypothesis data unavailable through empir¬ 
ical or, more particularly, statistical investigations. 

The Principle of Maximum Likelihood. Since in most cases it is 
practically impossible to assign values of empirical significance to the 
a priori probabilities in Bayes's theorem, the theorem has only a limited 
use. Therefore, it plays a very minor role as a means for determining the 
probability of a given hypothesis on the grounds of the available evidence. 

Statisticians who reject Bayes's postulate supplant it w ; th a different 
principle based on the use of likelihood. That is, for any A* and H, 

P{Ai\RH\ * P(Ai\H)L(R\AiII), 

where the factor L(R\AiH) stands for the likelihood function. 

The principle of maximum likelihood states that, when the problem of 
choosing from a number of hypotheses, A t -, arises, we are to choose the one 
(assuming it exists) that maximizes L(R\AiH). That is, we are to select 
the hypothesis which gives the maximum probability of the observed 
event. 

Other Theorems in the Calculus of Probability. The previous rules 
governing direct probability calculations are based on the assumption 
that the relative frequency of a proposition referred to a specified class 
of objects or events has a limit. There are other theorems in the calculus 
which require the fulfillment of additional assumptions. One of these 
is that the condition of irregularity obtains in the reference classes. This 
condition is known as a random character . It may be spoken of here 
as a method of selection which affords an eaual probability to certain 
propositions and thus permits the application of the calculus probability 
a priori. 

The irregular Kollektiv , by which is meant an infinite sequence of 
observations, is the foundation of the mathematical theory of probability 
advanced by von Mises. The condition of randomness, or impossibility 
of a gambling system, which the Kollektiv must satisfy, means that if the 
relative frequency of some particular attribute is calculated in a subse¬ 
quence of the Kollektiv , selected by some method which is independent 
of the Kollektiv itself, it must tend to the same limit as it does in the 
original Kollektiv. Randomness is fundamental in the theory of sam¬ 
pling to be discussed later, since the theory deals principally with samples 
generated by such processes. 

The Binomial Distribution . For reference classes satisfying the con¬ 
dition of random character, the following can be shown: If the probability 
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of having a specified property, say S , called “success” is p, and the prob¬ 
ability of not having it is q = 1 — p, then the numerical value of the 
probability that exactly t elements in a set (where t ^ n) have the 
property S while the remaining n — t elements do not have S is given by 


Pn.i 


n\ 

t\(n - t)\ 


ptqn-t 


t = o, 1 , 2, • • • , n; 
p = p(S) = constant 
for group of trials 


(2.03) 


This important theorem is termed the binomial law. P n>t is the general 
term in the binomial expansion of 

(q + p) n 

The maximum value of P„, t , where p and n are fixed, varies with t. 
This maximum value is given when t satisfies the condition 


pn + p>t>pn + p — 1 


(2.04) 


When n is very large, the value for which t gives a maximum may be 
taken as pn. This value indicates that the probability of sets with n 
successive elements which contain exactly i elements with the property S 
is largest when t is approximately equal to pn, or that the proportion of 
S’s, in a set of n elements is approximately equal to the limit of the rela¬ 
tive frequency of S in the Kollektiv. 

Equation (2.03) is a special case of a more general theorem dealing 
with situations in which not only two results are considered but in which 
the event may occur in k ways with probabilities pi, p 2 , , p k . Then, 

for a random sample of N from a multinomial distribution, it can be shown 
that the probability P* of N giving «i of the first kind, n 2 of the second, 
. . . , n* of the last, is 


Pk 


N\ 

nil, » 2 !, . . . , n k \ 


pn,p», . . • p»* 


(2.05) 


which is the general term in the multinomial expansion of 


(pi + Pt + • • • + PhY) N = ni + » 2 + • • • + n* 

The Poisson Distribution. An important distribution of the dis¬ 
continuous type which often describes the facts of observations is one 
where p, or the probability of an event, is very small, but where a large 
number of cases or trials, n, are taken so that pn is finite but small. 
The number of occurrences will be distributed in the Poisson series. 
Thus, 

p —► 0 n—* oo 

q —* 1 np remains finite = p = mean 
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‘Mean = m 

Variance = npq —> m 

’ 

The distribution is therefore determined by one parameter. 

If t = 0, 1, 2, . . . , the relative frequency with which the values 
occur is given by the series 


me~ 


m 2 e~ 


m x e~ 


t = 0, 


1 ! 

1, 


2 ! 

2 , 


xl 

x 


(2.06) 


This series is known as “ Poisson’s limit to the binomial. ” “the Poisson 
series,” or “the law of small numbers.” Probability tables for the 
distribution are given by Pearson (Ref. 5). 

The Normal Distribution. The binomial law basic in the theory of 
probability is exact, but it possesses the distinct disadvantage of involv¬ 
ing much labor, particularly in the computation of the factorials that 
enter in the term P n ,t [see (2.03)] when n is large. Furthermore, it is a 
theoretical distribution of the discontinuous or the discrete form. When 
the character is continuous, as is very often the case in measurements in 
science, a curve is essential in describing such continuous variation. 

It can be shown that by a series of approximations an analytic formula 
can be obtained from Equation (2.03) which takes the form 


Pt - -4r ( 

<J x/2tt 


s 2 

' 2<x 2 


where 5 = t — np (2.07) 
cr = \Zwpq 

and the graph of P t as a function of 5 is a symmetrical, bell-shaped curve 
variously called the normal distribution curve, the Gaussian curve, or 
the Laplacian-Gaussian error curve. Since the maximum value of the 
exponential e” x , for x > 0, is unity, it is noted that the normal approxima¬ 
tion for the probability that t will assume its most probable value is 

given by —or in terms of the binomial parameters, —where 
<r v 2w V 27 mpq 

q = 1 — p. It is obvious that the normal approximation gives the closest 
fit to the binomial when p = q (see page 58). 

In addition to the normal curve being the limiting form of the binomial 
distribution, as well as of certain other distributions, its usefulness in 
theory and practice is especially enhanced by the central limit theorem. 
According to this theorem, under certain conditions the sum of n inde¬ 
pendent random variables, in whatever form they may be distributed, 
tends to be distributed, when expressed in standard measure, as the 
normal distribution when n —> <». Another important property of the 
normal distribution is its reproductive property. For example, a linear 
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function of variates that are normally distributed is itself normally 
distributed. 

One of the earliest applications of probability was to the systematiza¬ 
tion of measurements and observations in the physical sciences, particu¬ 
larly astronomy. Legendre in 1$06 had formulated what has become 
known as the principle of least squares: When a set of empirical observa-j 
tions is used to establish the constants of a mathematical function, the 
best solution is that which reduces the sums of the squares of the residual 
errors to a minimum*. This principle was later placed on a definite 
mathematical and logical basis by the work of Gauss, Laplace, Maxwell, 
and others. That is, the normal curve, although previously formulated 
by de Moivre, was developed as a useful mathematical tool. Since it 
was used by Gauss to describe the distribution of “errors,” it was spoken 
of as the “normal curve of error.” The distribution curve is useful, how¬ 
ever, in many situations which have nothing to do with “errors,” as in 
the original setting in which it was used. It is useful in dealing with 
variations of different kinds, especially with experimental and other 
observational results, as in the biological sciences. 

Serious attempts were made, particularly by Quetelet, to apply the 
theory of probability to social statistics. He popularized the idea of the 
“average man” as computed from extensive statistics which he collected. 
It was through analogy'of the average man to the center of gravity in 
mechanics that he assumed human actions or traits as occurring in accord¬ 
ance with the operation of laws giving rise to a normal distribution. 
Unfortunately, this attempt, although the influence of Quetelet soon 
became very slight, seemed to have established the use of the term 
“normal” in connection with a law of distribution presuming that 
measurements should always be expected to follow the “normal law of 
errors” as if it were a law of nature, Though later developments have 
shown that in science the normal curve gives at times a very close 
approximation to the observed facts, these instances of very close 
approximations are the exception rather than the rule.. 

In dealing with the distributions of errors of measurement or observa¬ 
tion, the normal law of error was derived under the assumption that 
deviations from the most probable value are fortuitious, meaning that 
the forces in operation to produce them could not be resolved into more 
elemental factors. It was assumed that the deviations were as likely to 
be positive as negative, and that they varied without limit, that is, within 
the bounds of ± oo. Laplace’s generalization was that the distribution 
obtained by the repetition of a great number of identical alternatives is 
represented by the function e ~ x , such that the ordinates of the normal 
curve decrease on both sides of the maximum ordinate in such a way 
that their logarithms are proportional to the squares of the distances 
from the center. Extending this idea to fluctuations other than so-called 



Chap. II] 


PROBABILITY AND LIKELIHOOD 


29 


“ error,” it may be said that, if an observation, say z, is a resultant of the 
sum of the effects of a large number of small causes operating at randofn, 
and if each effect is independent of x , the obtained distribution is expected 
to be normal. 

The normal distribution holds a central place in the theory of sampling 
as well as in the theory of probability. 

Problems 

Exercises 1-9 are based on the assumption of a normally distributed 
population. 

1. What proportion of the total number of cases lies between one and 
two standard deviations (S.D.) above the mean? 

2 . What is the probability of obtaining a value of the variate in ran¬ 
dom selection at least as large as +1.96 S.D.? 

3 . What proportion of the area under the normal curve lies between 
1.27 S.D. and 1.33 S.D. ? lies above 1.3 S.D. ? lies above -1.3 S.D. ? 
lies below 2.1 S.D.? 

4 . What is the probability that a measure will lie in the range 2.5 S.D. 
to 3.1 S.D.? 

6. What is the probability of obtaining an absolute value of x/<r greater 
than 1.5? 

6. What is the relative length of the ordinate cutting off the lowest 
12.1 per cent of the area? 

7 . A variate is normally distributed with mean 13.5 and S.D. 3.6. (a) 

What measures selected at random might be expected to occur in not 
more than 5 per cent of the cases? in not more than 1 per cent of the 
cases? (b) What is the probability of obtaining a value of 15? of 8? 

8 . A variable is normally distributed with unit standard deviation. 
The probability of obtaining a value of 15 or greater from the popu¬ 
lation is .132. What is the mathematical expectation of the means of 
random samples? 

9 . A population has a mean of 37.6. It is found that 95 per cent of the 
values of the variate lie in the range 27.8 to 47.4. What values of 
the variate will occur with a probability of .01 or less? 

10 . Insofar as the theory of statistics is concerned, upon what does the 
concept of probability depend for its meaning? 

11 . In rolling a die, the variable X takes on values 1, 2, 3, 4, 5, and 6. 
If the die is unbiased, show that E{X ) is 3.5, E(X 2 ) is 15.167, and 
the standard deviation of X is 1.708. 

12 . In the classical example of the Poisson distribution given by Bort- 
kiewicz, the records of 20 army corps over a period of 10 years 
furnish 200 observations of the number of men killed by the kick of a 
horse. If the number of deaths is denoted by the variable X , which 
takes the values 0,1, 2, 3, and 4 with frequencies 109, 65, 22, 3, and 1, 
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show that the mean is approximately equal to the variance. Find the 
theoretical frequencies. 

13 . Calculate the frequency of girls in 100 families of 3 children each; 
p = .49. 

14 . Find the number of different committees, each of 3 persons, that can 
be selected from 5 individuals. 
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CHAPTER III 

SAMPLING DISTRIBUTIONS 


If the tools designed by the mathematical statistician are to be used 
intelligently and efficiently by the research worker, the former cannot 
evade the responsibility of setting forth clearly and unequivocally the 
conditions under which the use of each tool is valid and efficient. Where 
the statistician has done his part, it is the responsibility of the research 
worker to determine whether the necessary conditions obtain in his 
particular case. It should be pointed out that other tools are generally 
required to test whether or not these conditions hold good. The com¬ 
mand of these tools is an indispensable part of the researcher’s art. 
Once it has been established that the assumptions have been fulfilled, he 
can proceed with confidence in the results. 

So that the student may gain an insight into the logic and reasoning 
underlying the problems of drawing valid conclusions from experimental 
results, we present a number of commonly used models developed by the 
statistician for such purposes. It should be emphasized that the ability 
to distinguish the specific use or uses for each of the models will go a 
long way toward .developing the kind of statistical craftsmanship essential 
in the modern research worker. 

Preliminary Notions on Sampling and Inference. The material out 
of which the statistician constructs his model for practical use in interpret¬ 
ing experimental results is discovered by noting what happens when 
sample after sample is taken from the same population. It is noted, of 
course, that the results usually differ from one sample to another. Since 
the method of selection is kept uniform throughout the sampling process, 
these discrepancies can logically be assigned only to the process, because 
clearly the population remains constant .y It is proper, therefore, to 
speak of the fluctuations from sample to sample as sampling errors . 
These sampling or chance errors, as they are sometimes called, are found 
to follow chance laws, that is, though all together they form a uniform 
result, the value any sample might have cannot be accurately predicted. 
The individual deviations are unanalytic; that is, the forces operating 
to bring them about are incapable of resolution into simpler and identi¬ 
fiable components. Out of these sampling errors the statistician makes 
his model. Against such a standard it becomes possible to compare the 
experimental results. Since it is possible to measure the amount of 
sampling error to be expected in any given case, it is necessary only to 
note whether or not the experimental results conform with the standard, 
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that is, to compare the relative magnitude of the experimental results and 
their random sampling errors. In this comparison, if we note that the 
observed results, namely, the estimate of an effect presumed to exist, 
could seldom (say once in a thousand trials, once in a hundred, or once 
in twenty) be as large or larger owing to random errors of sampling 
alone, then the effect is said to be real in the sense that it is not likely 
to be due to sampling errors alone, and the experimental results are said 
to be significant On the other hand, if it is found that often (for instance, 
fifty times in one hundred, one time in five, or even once in ten, and so 
forth) results as large or larger could be obtained that would be attribut¬ 
able to random sampling errors alone, they are said to be insignificant . 
Ordinarily, the basis of determining whether results are significant or 
insignificant is as follows: 

(1) The results are said to be significant if the conclusion that they 
are would be erroneous in 1 per cent or less of the cases. 

(2) The results may be significant but further observations are neces¬ 
sary (that is, we suspend judgment) if the conclusion that the 
results are significant would be wrong in 5 per cent or less but 
more than 1 per cent of the cases. 

(3) The results are not significant if our conclusion that they are 
significant would be in error in more than 5 per cent of cases. 

The technical term for the process employed in examining the sig¬ 
nificance of experimental or observational results is “the test of sig¬ 
nificance/ J This process will be discussed much more completely in 
Chapter IV, The Testing of Statistical Hypotheses. 

The examples of empirical sampling experiments given below illustrate 
successive stages by which the statistician builds up the statistical models 
to be used in interpreting experimental results. This method, the way 
in which the earlier statisticians worked, provides a simple way of under¬ 
standing quite rigorously the theoretical foundation underlying statistical 
inference. Today it is not usually necessary to do an actual experiment 
in order to construct these statistical models based on sampling errors, 
since the theory of probability enables the statistician to deduce sampling 
distributions theoretically. In fact, the theoretical deduction of the 
sampling distributions of the numerous statistical quantities now in use 
is a highly specialized branch of mathematical statistics. This deduction 
is sometimes a problem of great mathematical difficulty. Particularly 
when new types of observational data are under consideration or where 
information of new kinds is under search, the mathematical problems 
at times have proved to be so formidable that the statisticians have had 
to rely on actual sampling experience. Although the mathematical 
derivations are of fundamental importance to statistical theory and 
practice, it should be apparent that the conclusions to be drawn from such 
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mathematical models would have no justification beyond the fact that 
they agree with what actually happens experimentally or would be 
arrived at from these simple sampling experiments. Theory is a tool 
which is tested through application and whose usefulness is decided in 
connection with the application. 


TABLE 2 

100 Random Samples of 5 for a Variable X from a Population with Mean 30 and 

Standard Deviation 10 


25 

19 

26 

23 

25 

50 


34 

5 

17 

25 

29 

38 

30 

34 

24 

18 

28 

35 

35 

29 

33 

19 

32 

22 

44 

18 

19 

21 

23 

26 

20 

Kill 

ESI 

24 

25 

30 

22 

30 

29 

43 

36 

27 

17 

35 

40 

34 

19 

19 

23 

28 

14 

38 

22 

37 

24 

33 

33 

27 

47 

22 

28 

26 

31 

32 

KTjl 

11 

KM 

24 

42 

29 

47 

24 

43 

16 

21 

38 

35 

42 

24 

47 

14 

38 

28 

27 

32 

26 

21 

29 

36 

20 

9 

46 

29 

21 

33 

26 

£6 

m 

22 

27 

23 

44 

E3 

34 

35 

42 

31 

34 

37 

30 

12 

30 

44 

37 

35 

55 

17 

25 

22 

25 

28 

25 

33 

37 

46 

18 

19 

32 

42 

6 

mm 

27 

12 

29 

23 

47 

37 

28 


28 

27 

23 

32 

24 

19 

21 

23 

26 

39 

53 

38 

38 

39 

30 

27 


27 


45 

27 

28 

29 

27 

21 

41 

51 


31 

38 

32 

18 

36 

25 

27 

33 

48 

25 

32 

26 

38 

42 

29 

19 

22 

49 

25 

17 

27 

33 

ESI 

35 

34 

34 

26 

31 

34 

15 

28 

Kill 

31 

14 

28 

KM 

35 

18 

Kil 

26 

16 

27 

23 

34 

21 

Kill 

39 

19 

28 

25 

14 

21 

31 

22 

21 

45 

18 

32 

36 

36 

28 

39 

41 

32 

38 

24 

38 

36 

19 

31 

31 

33 

27 

19 

43 

31 

22 

6 

33 

58 

32 


21 

35 

26 

38 

33 

29 

29 

49 

Wm 

41 

27 

38 

! 47 

33 

23 

24 

36 

21 

44 

35 

53 

32 

23 

47 

44 

26 

51 

45 

mim 

Kill 

33 

mm 

31 

51 

31 

31 

43 

19 

35 

35 


26 

mm 

36 

35 

24 

43 

31 

15 

19 

36 

39 

8 

44 

23 

28 

39 


27 

46 

38 

46 

27 

20 

42 

Kill 

28 

27 

29 

23 

mm 

15 

KM 

38 

23 

27 

21 

22 


11 

39 

36 

21 

K 

25 

26 

12 

22 

26 

38 

45 

34 

36 

45 

23 

16 

38 

17 

36 

27 

34 

32 

25 

38 

35 

12 


26 

47 

43 

18 

38 

45 

36 

37 

Kni 

25 


24 

25 

23 

26 

| 28 


42 

37 

27 

32 


15 

26 

42 

26 

39 

29 

37 

44 

34 

Kill 

18 


44 

31 

Kill 

Kill 

31 

36 

33 

34 

Kill 

ESI 

33 

42 

43 

35 

45 

28 


38 

44 

26 

24 

48 

37 

35 

34 

34 

22 

19 

34 

32 

29 

32 

22 


49 

27 

36 

26 

24 

1 

25 

21 

28 

18 

7 

27 


32 

35 

43 


46 

26 

26 

32 

25 

45 

31 

39 

Kil 

20 

34 

27 

19 

28 

34 

16 


18 

38 

ESI 

36 

19 

25 

13 

27 

36 

17 

23 

36 

25 

29 

24 

54 


15 

24 

10 

28 

37 

P| 

24 

27 

27 

14 

31 

34 

27 

29 

35 

38 


33 


21 

32 

36 

22 

26 

14 

28 

28 

_ 

33 

36 

22 

34 

29 

38 



The Sampling Distribution of the Mean. The first sampling experi¬ 
ment to be described deals with the arithmetic mean. To illustrate the 
way in which random sampling errors arise, we set up a normal population 
of values of some character, say X, whose mean is taken as known to be 
30 and whose standard deviation is 10; that is, m = 30, a = 10. 1 Sam- 


1 It is conventional to speak of the true values of the population as parameters 
and to denote them by Greek letters. Correspondingly, Roman letters are the sym¬ 
bols used for the estimates made of parameters or population values from samples. 
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pies of 5(n = 5) were chosen at random from the population. By this 
method, 100 samples of 5 for the variable X were obtained. The indi¬ 
vidual values for each of the 5 members of each sample are recorded 
for the 100 samples in Table 2. Note the range in values in the respective 
samples. For example, in one sample the range in the X-values is from 
5 to 47; in another, from 25 to 33. 2 

Next we computed the mean of each of the 100 samples. These 
values are recorded in Table 3. The 100 means vary between 19.4 and 
40.6, and both the highest and lowest means differ from the population 
mean of 30 by 10.6. Obviously, the means are much less scattered than 
are the individual values. These fluctuations in mean values are known 
as sampling errors. The amount of sampling error in each mean is the 
difference between it and the population value, that is, 30. The biggest 
error with which any one sample of 5 estimates the sample mean is 10.6. 
The smallest error is found to be 0.2: for instance, the difference between 
29.8 and 30.0. None of the 100 estimates is without sampling error. 

TABLE 3 

The 100 Mean Values of the 100 Samples of 5 Recorded in Table 2 


23.0 

38.6 

21.6 

34.8 

33.6 

31.2 

29.8 

33.8 

29.6 

23.2 

36.8 

27.0 

28.6 

29.0 

23.8 

35.0 

30.6 

32.0 

23.6 

28.4 

27.4 

28.4 

29.8 

32.2 

28.0 

29.2 

31.2 

34.6 

31.8 

31.0 

32.8 

27.6 

30.8 

40.6 

32.0 

27.8 

26.4 

38.0 

30.4 

27.2 

39.0 

32.2 

27.2 

29.8 

23.8 

21.8 

27.0 

26.4 

28.2 

21.6 

32.2 

34.6 

33.2 

22.6 

35.8 

22.6 

25.6 

29.2 

32.4 

37.8 

31.8 

32.4 

30.6 

28.8 

32.8 

37.8 

28.0 

33.8 

22.4 

23.8 

25.6 

26.6 

37.6 

31.4 

25.6 

21.8 

24.0 

24.6 

38.8 

29.4 

23.4 

31.6 

32.0 

32.6 

32.2 

24.6 

31.4 

24.2 

37.6 

29.2 

31.8 

31.8 

28.2 

26.0 

26.0 

19.4 

35.0 

38.4 

31.4 

32.6 


The small samples of 5 give sampling errors greater than would larger 
samples. Had we taken samples of 50, for instance, the means would 
have been less scattered, indicating smaller sampling errors. This 
tendency toward less variation among sampling means and correspond¬ 
ingly smaller differences between sample means and the true mean, and 
thus a smaller sampling error, would continue as the size of the sample 
became larger and larger. For example, by calculating the mean of the 
100 sample means, we obtain the mean of a single sample of 500 equal to 
29.8, a value very close to the population mean of 30.0. 

For a sample of a given size, the errors of random sampling increase 
as the variation among individuals in the population becomes greater. 

For example, the estimated mean is the estimated standard deviation, s. This 
convention is followed throughout this book. 

* Mahalanobis (see Ref. 5) and others have given tables of random samples from a 
normal distribution and have shown how to use these tables to get samples of any size 
for any mean and standard deviation. We have followed this method in several of 
the empirical sampling experiments described. 
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In fact, the sampling errors are directly proportional to the increase of 
variation in the population. As an extreme case, it is obvious that, had 
there been no variation among individuals in the population sampled 
and had they all been 30, the means irrespective of sample size would 
have been 30. Hence, there would have been no sampling errors. 

The means in Table 3 may be arranged into a frequency distribution, 
thus showing the number or frequency of means falling between limits 
as noted on the base scale (Table 4). This frequency distribution of 



Figure 1 . Distribution of means, X 's of 100 random samples of 5 from a normal 
population with mean 30 and standard deviation 10. Normal curve superimposed 
upon the histogram. 

means is presented in the form of a histogram in Fig. 1. Measures of 
central location and variability for this frequency distribution of means, 
which is called the sampling distribution of means , can be calculated. 
It is to be expected that the mean of the 100 means should be the same 
as the mean of the population being sampled. In our case, the mean 
of the distribution is found to be 29.8, which agrees closely with 30, the 
true mean. By increasing the size of the samples, the observed value 
would become almost exactly 30. The standard deviation of the sam¬ 
pling distribution of means gives an estimate of the size of sampling errors, 
thus summing up the information concerning the whole distribution of 
errors. If the standard deviation of the sampling distribution is large, 
the errors of sampling are, as a whole, large. Correspondingly, if the 
standard deviation is small, the errors are small. The standard deviation 
of the frequency distribution of means in Table 4, calculated in the usual 
manner, is 4.82. 
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As was pointed out earlier, the statistician does not usually carry 
out this very simple and tedious sampling procedure, since application 
of the mathematical theory of probability enables him to determine 
theoretically the sampling distribution and the standard error of a 
statistic. The application of known mathematical laws gives results 
that are as accurate as a sampling experiment using millions of samples. 
Hence, the method of mathematical deduction is at the same time less 
laborious and more accurate. If the samples are all drawn from a 
normal population under ideal random sampling conditions, it is known, 
as in our case, that the sample means are normally distributedabout the 
population mean with a standard deviation equal to or/\ / 'n y where <r 
denotes the value of the population standard deviation and n, the number 
of sampling units. Even if the variable is not normally distributed in the 
population, it is known that the distribution of totals, or of the means, 
tends toward normality as the size of the sample is increased. 


TABLE 4 

Frequency Distribution of the 100 Mean Values for the Samples of 5 Given in 
Table 3 and the Test of Goodness of Fit 


Class interval 


41.95 to 

39.95 to < 

37.95 to J 

35.95 to « 

33.95 to i 

31.95 to ; 

29.95 to I 

27.95 to ! 

25.95 to! 

23.95 to ! 

21.95 to : 

19.95 to : 

17.95 to 
— 00 to 


Total 



9 d.f.; P > .35 


In our case, then, for samples of 5 from the known normal population, 
we expect the sample means to be normally distributed about 30 with a 
standard deviation of <r/\/n = (10)/\/6 = 4.472. Our empirical results, 
that is, x = 29.8 and = 4.82, seem to be in close agreement. It is 
also noted (Table 4) that the observed distribution of means seems to 
agree very well with the theoretical values educed on the above theory 
using the mean and standard deviation calculated from the population 
values. Even with samples as small as 6, the agreement between 
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observation and expectation seems close. We need, of course, a more 
careful definition of what is meant by “seemingly close.” This is given 
by the chi-square (x 2 ) test for goodness of fit. 3 Referring to the x 2 -table 
(Table III, Appendix) with 9 degrees of freedom we note that for a 
X 2 = 9.484 ~ P > .35. Therefore, we conclude that we accept the 
hypothesis that our 100 mean values ( X ’&) are normally distributed. 

The Sampling Distribution of the Difference Between Means. Our 
second sampling experiment deals with the differences between the means 
of random samples. Here we have taken 100 random samples of size 5 
for a pair of variables, say Xi and X 2 , which are independent of each 
other. We know that our parent population of X is normally distributed 
with mean = 30 and standard deviation <r = 10. 

The mean-difference values of Xi and X 2 for the 100 samples have 
been calculated and recorded in Table 5. 


TABLE 5 

The Mean-Difference Values of Xi and X 2 for the 100 Samples of 5 


- 4.8 

6.4 

-12.6 

0.6 

12.6 

- 1.4 

0 

4.6 

- 0.6 

- 1.4 

15.0 

0.2 

4.6 

- 3.4 

- 0.4 

- 2.8 

0.4 

-1.6 

0.6 

- 8.4 

- 3.4 

- 1.4 

- 2.8 

- 1.0 

- 1.0 

- 0.8 

1.4 

3.6 

5.0 

5.2 

0.6 

0.8 

- 3.0 



-15.8 

- 2.8 

10.6 

1.4 

3.6 

7.0 

-4.2 

- 3.6 


-11.6 

-17.4 

- 8.4 

0.4 

0.6 

- 9.2 

- 0.4 

4.0 

6.2 

4.4 

16.4 

- 3.2 

-12.8 

- 4 .S 

- 2.2 

12.0 

1.4 

7.4 

0.2 

- 3.4 

4.0 

8.0 

9.4 

- 6.0 

-1.0 

- 6.0 

- 4.2 

-5.6 

13.8 

5.4 

1.6 

- 7.4 

- 6.2 

- 5.2 

7.0 

1.2 

1.2 

3.8 

13.0 

5.4 

3.4 

- 2.8 

5.2 

- 2.0 

14.2 

~ 1.4 

1.0 

-4.0 

- 6.2 

- 7.4 

- 5.6 

-17.6 

3.0 

2.8 

8.0 

0.6 


From sampling theory it is known that the mean-difference values, 
(X 2 — Xi )’s, are normally distributed about a mean of 0 with standard 
deviation VVV^, where o- is the population standard deviation and n 
denotes the sample size. 

The mean-difference values in Table 5 are arranged in a frequency 
distribution in Table 6. We find the mean of the mean-difference values 
to be d — .39, and the standard deviation, or standard error, of the mean 
of differences to be 53 => 6.736. The corresponding parameter values are 
11 — 0 and <t 2 = 6.325. The observed values are well within the limits 
of sampling error. 

Again we wish to test the goodness of fit of the normal distribution. 
The theoretical frequencies (/,) were calculated and are given in Table 6. 
Chi-square is the appropriate test of goodness of fit of the theoretical 
and observed distributions. Its value is found to be 10.985. We 
enter the x 2 -table (Table III, Appendix) with 9 degrees of freedom and 
Xo = 10.985. The corresponding probability value is P > .27. Hence, 


8 See page 96. 
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we may conclude that the 100 mean-difference values are normally 
distributed in accordance with sampling theory. 

The Sampling Distribution of the Variance. Our third sampling 
experiment deals with the variance. Samples of 5 were chosen at random 
from a normal population with mean /t = 30 and variance <7* = 100. 
The sum of the squares of the deviations of the observational values Xq 


TABLE 6 

Frequency Distribution of the 100 Mean-Difference Values for the Samples 
of 6 Given in Table 5 and the Test of Goodness of Fit 


Class interval 
%2 “ 

/* 


/o ~ ft 

(/o -/<)’ 

(/o -ft)' 
ft 

17.95 to 

00 

0^ 

\ 





15.95 to 

17.95 

ii 

! 





13.95 to 

15.95 

2 

>0 

5.79 

3.21 

10.3041 

1.780 

11.95 to 

13.95 

5| 






9.95 to 

11.95 

li 

1 





7.95 to 

9 95 

3] 






5.95 to 

7.95 

5J 

[ 8 

11.55 

- 3.55 

12.6025 

1.091 

3.95 to 

5.95 

11 


9.26 

1.74 

3.0276 

0.327 

1.95 to 

3.95 

6 


11.31 

-5.31 

28.1961 

2.493 

- 0.05 to 

1.95 

19 


12.41 

6.59 

43.4281 

3.499 

- 2.05 to 

- 0.05 

13 


12.38 

0.62 

.3844 

0.031 

- 4.05 to 

- 2.05 

12 


11.19 

0.81 

.8472 

0.076 

- 6.05 to 

- 4.05 

9 


9.18 

-0.18 

.0324 

0.004 

- 8.05 to 

— 6.05 

51 

I 





-10.05 to 

- 8.05 

2J 

l 7 

11.33 

-4.33 

18.7489 

1.655 

-12.05 to 

-10.05 

1' 






-14.05 to 

-12.05 

2 I 

) 





-16.05 to 

-14.05 

1 


5.60 

0.40 

.6600 

0.029 

-18.05 to 

-16.05 

2! 

p 





— 00 to 

-18.05 







Total 


100.00 j 

0.00 

Xo 2 

- 10.985 


d.f. =9; .30 > P > .20 


from their mean X { for each of the 100 samples was obtained, and these 
values are recorded in Table 7. 

We have calculated the variance of each sample by dividing the sum 
of the squares of the deviations of the observation values X„ from X, by 
n — 5. These estimates are called Pearsonian. Thus, 


('■> 




(3.01) 


where Xi is the mean of X for the t'th sample and Xq is the jth individual 
in the ith sample. 

The «? (J>) were arranged into a frequency distribution and the theo¬ 
retical frequencies calculated. 
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We found the mean and the standard deviation of the s? to be as 

\P) 

follows: 


sf P) = 68.945, s s2(p) = 53.279 

The test of the goodness of fit of the chi-square function gave a 
X? = 11.836. Entering the table of x 2 (Table III, Appendix) with 6 d.f., 
the probability was found to be .10 > P > .05. It was concluded that 
the sampling distribution of the $? P) follows the x 2 distribution. 

TABLE 7 

The 100 Values of the Sums of Squares Based on Samples of 5 from a Normal 
Population with Mean 30 and Standard Deviation 10 


112.0 

53.2 

85.2 

318.8 

235.2 

402.8 

443.2 

174.8 

241.2 

849.2 

229.2 

408.8 

186.0 

370.8 

138.8 

180.8 

60.8 

442.0 

97.2 

25.2 

240.0 

66.8 

444.8 

202.0 

284.8 

970.8 

150.8 

470.8 

507.2 

1613.2 

362.8 

401.2 

354.8 

214.0 

339.2 

393.2 

437.2 

333.2 

738.0 

466.8 

169.2 

518.0 

1158.8 

323.2 

381.2 

322.8 

250.8 

82.0 

422.0 

93.2 

181.2 

584.8 

168.8 

176.8 

218.8 

180.0 

154.0 

74.0 

86.0 

231.2 

93.2 

46.8 

164.8 

313.2 

626.0 

85.2 

353.2 

128.8 

542.0 

900.8 

354.8 

730.8 

234.8 

57.2 

187.2 

485.2 

981.2 

257.2 

176.8 

764.8 

201.2 

470.8 

632.8 

346.8 

400.8 

575.2 

977.2 

118.8 

73.2 

249.2 

439.2 

455.2 

433.2 

228.8 

48.8 

534.8 

390.0 

111.2 

303.2 

63.2 


Each of the $? p is an estimate of the variance, cr 2 , of the population 
(= 100) from whick we were sampling. Therefore, we expect the mean 
of the 100 samples of 5 to be approximately equal to <r 2 or 100 . From 
sampling theory it is known that the expected standard deviation of the 


s? P) ’s is: 


_ V2(n - 1) , 100 VI , 56.57 
< r > n 5 


(3.02) 


Our calculated value of the standard deviation of the 100 obtained 
values of s? was 53.279. Thus, both the mean and the standard devia- 
tion of the 100 sample values of $ 2 P) differ considerably from expectation. 
Estimates are considered biased estimates if in repeated sampling their 
mean or mathematical expectation does not equal the true, or population, 
value. Our obtained mean value, s? P) is too low. It is 1.22 standard 
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errors below the true value. Although the obtained mean is within the 
limit of random sampling fluctuations, we shall consider whether or not 
a closer agreement with expectation can be obtained. 

We shall now calculate our estimate in a different manner. Define: 


y - xd* 

«i, - x — 4 — 0 -!: ;:: : r) < 3o3 > 

where the subscript ( u) indicates the unbiased estimate. 

We calculated the 100 values of . They are recorded in Table 8. 

*(u) * 

TABLE 8 

100 Unbiased Estimates, s 2 t(u) Calculated from the Sums of Squares in Table 7 



We now observe that our calculated value, sf tt) = 85.92, is .46 standard 
error below the true value, well within the limits of random sampling 
fluctuations. 

We may now state that the usual method of calculating the estimate 
of the population variance, that is, as an estimate of <r 2 , gives a 











Chap. Ill] 


SAMPLING DISTRIBUTIONS 


41 


biased estimate. The explanation of the bias is that, if we use s* , its 

Ti — I 

mean in repeated sampling is not a 2 but-<r 2 , where n is the size of 

7t 

the sample. In our case: 

s ( 2 w is an estimate of <r 2 —-— or 100 • \ = 80 (3.08) 

71 O 

Now, our calculated value of s* (P) = 68.95 does not differ significantly 
from the theoretical value 80; that is, it is .44 standard deviation below, 
and the difference is due to sampling error alone. 

The amount of the bias in using s (p) as an estimate of <r 2 is evidently 

- o- 2 . The theoretical standard deviation of the distribution of s? Plt when 
n K J 

n is large, is <r 2 \/2(n — I)/ft. It may be worth noting the relative 
magnitude of the bias and sampling error. The respective values for 
various sizes of samples are recorded in Table 9. 

The values in columns (2) and (3) of Table 9 show that the bias is 
substantial in comparison with random sampling error, especially for 
small samples, for instance n = 50 or less. The conclusion is that, since 
there is no justification for willfully introducing a bias, the unbiased 
estimate, should be used when estimates of the population variance 
are required as in problems of statistical inference. When mere descrip¬ 
tion is involved, s 2 CP)) may properly be used. 

TABLE 9 

Comparison of the Bias in Using sm 2 as an Estimate of < r 2 with Sampling Errors 


Size of sample 
(») 

Relative amount of 
bias 

0 

Relative amount of 
sampling error 
y/2 (n - 1) 
n 

a) 

(2) 

(3) 

2 

0.50 

0.71 

3 

0.33 


5 

0.20 


10 

0.10 

0.42 

20 

0.05 

0.31 

50 

0.02 

0.20 

100 

0.01 

0.14 


Likewise, it is customary to consider the square root of the unbiased 
estimate of the variance as the unbiased estimate of the standard devia¬ 
tion; that is, 
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It may be stated, however, that since sf u) in repeated sampling equals 
c r 2 , Equation (3.09) does not imply that in repeated sampling the mean, 
(«(u>) = It is well known that the mean of a sum of numbers does not 
exactly equal the square root of the arithmetic mean of their squares. 
To illustrate: 

X=i(l+3 + 5 + 6 + 8 + 9)=5£ 

VMean of (X 2 ) = Vi(l + 9 + 25 + 36 + 64 + 81) = 6 

We have arranged the hundred s? M /s in Table 8 into a frequency dis¬ 
tribution (Table 10). The theoretical values have been also calculated 
and the theoretical curve based on these has been constructed in Fig. 2. 
The test of the goodness of fit of the theoretical for the observed fre¬ 
quencies gives a xo value of 15.25 with P > .05. The model, in this 
case built up of the sampling errors of the variance, is known as the 
chi-square curve . This model is important in statistical theory and 
practice (see Table III, Appendix). 


TABLE 10 

Frequency Distribution of the 100 Unbiased Estimates s 2 ( a > of the Population 
Variance in Table 8 and the Test of Goodness of Fit 


Interval 

U 

ft 

/o ~ ft 

(/o -ft)* 
ft 

331.925 to oo 

291.700 to 331.925) 
237.200 to 291.700} 


10 

-3 

0.90 

194.475 to 237.200J 
149.725 to 194.475 


10 

- 5 

2.50 

121.950 to 149.725 


10 

-4 

1.60 

83.925 to 121.950 


20 

7 

2.45 

54.875 to 83.925 


20 

-4 

0.80 

41.225 to 54.875 


10 

4 

1.60 

26.600 to 41.225 


10 

-2 

0.40 

17.775 to 26.600 
10.725 to 17.775) 


5 

4 

3.20 

7.425 to 10.725} 
0.000 to 7.425; 


5 

3 

1.80 

Total 


100 

0 

xo 2 - 15.25 


d.f. « 8; .10 > P > .05 


The Sampling Distribution of t . We have now considered two princi¬ 
pal models, the normal and the chi-square, which the statistician has 
developed. It is to be remembered that in the development of both of 
these models it was assumed that the variance or standard deviation of 
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the population was known. It is not often the case, however, in experi¬ 
mental work that the population value is known. Furthermore, the 
experimenter is usually dealing with small samples. The construction 
of a model against which experimental results of this kind could be com¬ 
pared would indeed be a genuine contribution to the research worker. Let 
us trace the way in which this problem was solved. 

Since the population standard deviation is unknown, the only source 
of information concerning it is that provided by the sample. It was 
observed (see Table 8) that the sample variance, and hence the standard 



° s ? S S 1 8 I | 8 § § g % I % 8 l g % I 


Values of s* u j 

Figure 2. The Chi-square distribution curve based on the unbiased estimates 
of the variance, s 2 ( u /s, of 100 random samples of 5 from a normal population with 
mean 30 and variance 100 (Table 10). 


deviation, is often very different from the population standard deviation. 
Each of these variances or standard deviations was an estimate of the 
population value. The smallest standard deviation was \/6^3 or 2.51; 
the largest, V403.30 or 20.08. It was essential that a model to be effec¬ 
tive should take these sampling fluctuations into account. How this 
was done was to set up a ratio of the difference between the sample mean 
and population mean to its estimated standard error. This ratio was 
called t. In mathematical terms we may proceed as follows: 

Suppose that the parameter <r is unknown, though ^ = 30 in our 
parent population (m may or may not be known). Define: 


ti = 


(Xi - m ) 


12 (x (j - *«)■ 


(X< - M ) Vn(n - 1) fi = 1, 

yJ2 (A,» - &)* V = 


100 \ 

5/ 


(3.10) 
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where & is the mean value of the *th sample and X,-, is the jth individual 
in the ith sample. Then 

v = — r\ - 

r (f)vWx 

where y is the ordinate value for a specific value of t and m is the number 
of degrees of freedom; r denotes a Gamma function. 

In our sampling experiment we took 100 samples of size 5, 

y. = 30, n = 5, m = 4 

The 100 ^-values were calculated and these were recorded in Table 11. 

TABLE 11 

The 100 ^-Values for 100 Random Samples of 5 from a Population with Mean 
of 30 and Unknown Variance a 2 



—2.9580 

-5.1504 

1.7442 

-0.0501 

-0.1166 

1.5152 

-0.2974 

-2.0972 

0.1728 

-0.9822 

-0.7680 

-0.0442 

-0.6558 

0.2787 

0.6833 

0.9313 

0.4588 

0.4254 

-1.6330 

0.3563 

2.5981 

-1.5321 

-1.3147 

-0.9440 

-0.4770 

0.3158 

1.1654 

1.1954 

-0.8737 

0.2672 

0.4226 

0.1340 

0.6648 

-0.6114 

-1.8454 

-0.9923 

1.6255 

-1.0684 

-0.9877 

1.8215 

-2.2691 

0.3930 

0.2890 

0.3483 

1.7408 

0.4480 

-0.5083 

-1.9755 

1.0885 

0.6485 

2.8572 

0.8877 

0.4131 

1.2781 

-2.0559 

- 1.0000 

-0.3604 

2.5994 

0.9645 

0.4706 

-0.7412 

1.4382 

-0.2787 

1.1624 

0.1787 

-1.1628 

2.5224 

-0.8669 

1.5368 

-0.4172 

0.5223 

-0.0331 

-2.3932 

-2.1287 

-2.7456 

0.9339 

-1.0565 

-2.0635 

-0.2691 

1.2614 

0.7567 

-0.2473 

1.3867 

0.9126 

-1.3850 

-0.6340 

0.2003 

-3.3645 

-2.8226 

-0.1700 

0.3414 

0.5450 

-1.1603 

-1.7148 

-0.5121 

0.3481 

-0.9058 

-4.4954 

2.1574 

1.4626 


Theoretically, we have 


Pt = 0 


(Tt 



= 1.414 


where m* and <r t are the mean and standard deviation of all the possible 
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1 “ 


‘ - Too ■ _165 


St = 


E ft -*) 1 

100 


= 1.490 (i = 1, 
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100 ) 


Again, we wish to test the goodness of fit for the ^-distribution by 
using the x 2 -criterion. This test is given in Table 12 . 


TABLE 12 

Distribution of the 100£-Values from Means of Samples of 5 and Test of 

Goodness of Fit 


Class interval of t 

/o 

/« 

/. -/< 

Cfo-/<) ! 

(/»- ft) 1 
ft 

4.005 to +°° 


0.79 ) 




3.005 to 4.005 

0 [5 

1.20 5.78 

-0.78 

0.6084 

0.105 

2.005 to 3.005 

5 J 

3.79 j 




1.005 to 2.005 

15 

12.78 

2.22 

4.9284 

0.386 

0.005 to 1.005 

30 

31.25 

-1.25 

1.5625 

0.050 

-0.995 to 0.005 

26 

31.36 

-5.36 

28.7296 

0.916 

-1.995 to -0.995 

12 

13.00 

-1.00 

1.0000 

0.077 

-2.995 to -1.995 

9 1 

3.83 \ 




-8.995 to -2.995 

1 J 12 

1.19 >5.83 

6.17 

38.0689 

6.530 

— oo to —3.995 

2) 

0.81 J 




Total 

100 

100.00 

0.00 

Xo 2 

= 8.064 


d.f. - 5; P > .14 


Referring to the x 2 table (Table III, Appendix) with x 2 = 8.064 and 
5 degrees of freedom, we find that P > .14. Therefore, we conclude 
that our 100 ^-values are distributed as the ^-function. 

We have arranged the 100 lvalues in a frequency distribution and 
plotted the histogram. The theoretical frequency distribution of t has 
been calculated and the corresponding curve has been superimposed on 
the histogram (Fig. 3). The theoretical frequency curve of the sampling 
distribution of t is a symmetrical leptokurtic curve. Tables (see Table II, 
Appendix) have been prepared which enable one to determine for a 
given size of sample the probability of getting a value of t greater than or 
equal to ±t 0 , or the value in the sample, due to random sampling errors 
alone in repeated sampling. Against this model, when it is appropriate 
for the problem involved, the experimenter may then compare liis experi¬ 
mental results with the view of examining their significance. 





Frequency 
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Contribution of “Student.” It is fitting here to point out the sig¬ 
nificance of the contribution of the writer who signed himself “Student” 
to the refinement of the classical theory of errors. First, it is usually 
held that the date of his publication (Ref. 7), 1908, is the beginning of 
modern statistical theory and practice. When Student began his work 
as one of the brewers of Guinness, Son and Company, the available 
statistical tools were postulated upon large sampling theory. In the 
course of his work it was necessary for him to draw conclusions from the 
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Values of t 

Figure 3 . Distribution of the t-values of 100 random samples of 5. Theoretical 
curve of the t-distribution superimposed upon the histogram. 

results of small samples which themselves furnished the only indication 
of their variability. Rigorous conclusions under such conditions became 
possible through Student’s determination of the exact sampling distribu¬ 
tion of the statistic, thus making allowance for its sampling errors. He 
demonstrated that notwithstanding these sampling errors, which in the 
case of very small samples are large, it was possible to derive a test of 
significance both rigorous and exact. Since the number of degrees of 
freedom is one of the parameters in the equation of the sampling distribu¬ 
tion, the restriction previously set up, namely, that the sample must be 
“large,” was removed. 

The applicability of Student’s test has, of course, been greatly 
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extended by modern research in mathematical statistics. There are two 
memorials to Student which will undoubtedly endure: (1) a “Studentized” 
function, that is, a statistic whose sampling distribution, originally involv¬ 
ing the standard deviation of the population, is altered so that its sampling 
distribution uses quantities calculated only from the sample; (2) an exact 
test of significance, that is, a test which depends on a known probability 
distribution and thus is independent of irrelevant unknown parameters. 

The ^-Distribution of the Difference between Means. More fre¬ 
quent than the need of comparing experimental results of a single sample 
with a model is that of comparing the results for two independent samples, 
for instance, the difference between the means of the experimental and 
control groups. Therefore, we present the results of an empirical sam¬ 
pling experiment dealing with the model built up of the sampling 
errors of differences between means. The samples used in this case 
were those obtained in the sampling experiment described on page 37 
and in Table 5, where 100 random samples of five for a pair of variables 
Xi and X 2 , which are independent of each other, were taken. Here we 
have assumed that we do not know the population standard deviation 
and hence have to estimate it from the sample. The results in this case 
are found to be described by the model ^-distribution. Suppose that we 
do not know the parameters, ai and cr 2 , though we know /xi = 30 and 
M 2 = 30 in our parent population (/xi and fi 2 may or may not be known). 
Define: 


U = 


( Xm — X 2r ) + n 2 — 2 


VJ (Xui - Xi,) 2 + J {X 2li - 

3 3 


0 



( 3 . 12 ) 


where Xu is the mean value of X\ in the ith sample; X 2i is the mean 
value of X 2 in the ith sample; X\a is the jth individual in the ith sample 
for X\) and X 2ii is the jth individual in the ith sample for X 2 . Then it 
may be shown (Ref. 1) that for samples, tu and n 2 from a normal popula¬ 
tion, the distribution of t is given by 


V = 



(3.13) 


where y is the ordinate value for a specific value of t and m is the number 
of degrees of freedom. In our case, 


rii = 5, n 2 = 5, m = ni+n 2 — 2 = 8 


The J-values have been calculated for the 100 samples of 5 for a pair 
of values Xi and X 2 and are recorded in Table 13. 
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Theoretically, we have 

M< = 0 _ 

'■ - ^2 - 11547 

where mi and <r« are the mean and standard deviation of all the possible 
1-values, respectively. For our 100 1-values, we have 



Finally, we wish to test the goodness of fit for the 1-distribution by 
using the x 2 -criterion. The test is given in Table 14. 

TABLE 13 

The 100 4-Values for Mean Differences of 100 Random Samples of 5 for 
Pairs of Values Xi and X 2 


- 0.9851 

- 2.4885 

1.5956 

0.0000 

- 0.1539 

3.2678 

0.6438 

- 0.0719 

0.0591 

0.0708 

- 0.5728 

- 0.6053 

- 0.2113 

0.2773 

1.4153 

0.1244 

- 0.7650 

0.6870 

- 0.3565 

0.3285 

1.3229 

- 0.5523 

- 1.6920 

- 1.7380 

0.1232 

- 0.0400 

1.7302 

3.0993 

- 2.1881 

- 0.2028 

0.2387 

0.0299 

0.7405 

1.8521 

— 0.1302 

- 0.5922 

2.1985 

0.2455 

- 0.7852 

1.0640 

0.2619 

2.1117 

0.3732 

0.8348 

2.3370 

0.0969 

- 1.1153 

- 1.2291 

0.3926 

3.2338 

0.8221 

0.0856 

- 0.3300 

0.9167 

- 0.2725 

0.0358 

- 0.8355 

- 0.5859 

- 0.6295 

- 1.7950 

-i 0.5491 

- 0.2123 

- 0.1353 

0.5357 

0.6711 

0.1451 

2.1691 

- 4.3324 

1.7070 

0.4798 

- 0.5952 

0.0301 

- 2.3705 

0.0983 

- 1.4368 

0.4930 

0.5699 

- 0.4897 

- 1.2175 

1.1948 

1.8076 

- 0.6283 

1.1622 

- 0.8290 

- 0.9685 

- 0.8922 

0.4934 

- 1.8228 

- 1.2257 

0.2524 

0.5445 

0.9909 

- 0.5337 

- 0.1298 

- 0.1801 

- 0.6218 

- 1.5863 

- 2.6307 

0.4127 

0.1375 


Referring to the x 1 table (Table III, Appendix) with x* = 8.817 and 
with 7 degrees of freedom, we find that P > .25. Therefore, we conclude 
that our 100 1-values are distributed as the 1-function. 

The Sampling Distribution of the Correlation Coefficient. We now 
present the results of a sampling experiment which illustrate the theo- 
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retical or statistical model built up from the sampling errors of the 
correlation coefficient in repeated random sampling from a population in 
which the true correlation is known to be zero.^ 

The samples used were the ones obtained by taking 100 samples of 
5 pairs of values from a normal population in which there was no correla¬ 
tion at all between the variables (see page 37). 

TABLE 14 

Test of Goodness of Fit of the Theoretical ^-Distribution for the Observed 

^-Values of Table 13 


Classes 

fo 


/. -/> 

(Jo -/.)* 

(fo -/.)* 
ft 

3.505 to 

00 

0 






3.005 to 

3.505 

3 1 






2.505 to 


0 

>12 

8.57 

3.43 

11.7649 

1.373 

2.005 to 


4 1 






1.505 to 


5 , 






1.005 to 


5 


8.66 

-3.66 

13.3956 

1.547 

0.505 to 


11 


14.11 

-3.11 

9.6721 

0.685 

0.005 to 


24 


18.46 

5.54 

30.6916 

1.663 

-0.495 to 


15 


18.58 

-3.58 

12.8164 

0.690 

-0.995 to 

-0.495 

18 


14.16 

3.84 

14.7456 

1.041 

-1.495 to 

-0.995 

5 


8.77 

-3.77 

14.2129 

1.621 

-1.995 to 

-1.495 

5 






-2.495 to 

-1.995 

3 1 

1 





-2.995 to 

-2.495 

1 

mm 

8.69 

1.31 

1.7161 

0.197 

-3.495 to 

-2.995 

1 1 






— 00 to 

-3.495 

t 






Total 



0.00 

Xo 2 = 

8.817 


d.f. = 7; P > .25 

Let us define: 



where Xu and X 2i are the means for X x and X 2) respectively, in the ith. 
sample, and Xuj and X 2i] - are the jth individual in the zth sample for 
X\ and X 2 , respectively. The r-values for the 100 samples in Table 5 
have been calculated and recorded in Table 15. 

Then it may be shown that r is distributed in repeated sampling in the 
following function: 



(3.15) 
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where y is the ordinate value for a specific value of r and n is the sample 
size. In our case, n — 5. 


TABLE 15 

The 100 Values of the Correlation Coefficient r Calculated for 100 Random 
Samples of 5 from a Population in Which the True Correlation Is Zero 


-.829 

-.964 

.359 

-.310 

.294 

-.669 

-.134 

.349 

.876 

-.614 

-.692 

-.862 

.168 

-.748 

.482 

.142 

-.091 

-.235 

.663 

-.726 

.126 

-.926 

.289 

.152 

-.640 

.548 

-.403 

-.706 

-.274 

-.114 

.522 

.391 

-.636 

.370 

.114 

.830 

.111 

.111 

.150 

.482 

.563 

.437 

.196 

.516 

.720 

.678 

.733 

-.212 

.007 

744 

-.392 

.196 

-.108 

-.124 

-.690 

.378 

.847 

.196 

.640 

-.885 

-.626 

.079 

-.631 

-.589 

.665 

-.613 

.549 

-.304 

.221 

-.763 

-.032 

-.483 

-.294 

.216 

.737 

.528 

.368 

-.716 

-.638 

-.238 

.481 

-.881 

-.388 

.075 

-.832 

-.296 

-.276 

-.486 

-.010 

-.349 

.298 

.089 

-.232 

.725 

-.574 

-.278 

.120 

.725 

.159 

-.473 


Theoretically, we have 

Hr = 0 

Or = - tJL . - = .500 
VJl — 1 


where n r and o r are the mean and the standard deviation of all the possible 
r-values, respectively. For our 100 r- values, we have 




' - ioo - - 0596 


n (r,- - r) 2 

Sr = \ 100 


= .5076 


Finally, we wish to test the goodness of fit for the r-distribution by 
using the x 2 -criterion. 4 The test is given in Table 16. 


4 F. N. David, Tables of the Correlation Coefficient, Biometrika Office, University 
College, London, 1938. 
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Referring to the x 2 -table (Table III, Appendix) with x 2 = 12.004 
and with 8 degrees of freedom, we have P > .14. Therefore, we con¬ 
clude that our 100 r-values are distributed as the r-function [Formula 
(3.15)]. 

It is known from sampling theory that for large samples, where n is 
larger than 100, r is approximately normally distributed about zero 

(p = 0) with a standard deviation equal to — 7 _J —In our sampling 

V» — 1 

experiment it is noted that the mean of the 100 sample values of r was 
— .0596 and that the standard deviation was .5076. Even with samples 

TABLE 16 

Distribution of 100 Correlation Coefficients and the Test of Goodness of 
Fit of the Theoretical for Observed Values of r 


Class interval 

fo 

/< 

/o ~ ft 

Wo 

w. -/«)• 

ft 

' 

.805 to 
.605 to 

1.000 

.805 

!}“ 

SL97 } 13 - 98 

- 2.98 

8.8804 

0.635 

.405 to 

.605 

10 

10.96 

- 0.96 

0.9216 

0.084 

.205 to 

.405 

11 

12.10 

-1.10 

1.2100 

0.100 

.005 to 

.205 

18 

12.64 

5.36 

28.7296 

2.273 

- .196 to 

.005 

6 

12.65 

-6.66 

44.2225 

3.496 

- .896 to 

- .196 

14 

12.14 

1.86 

3.4596 

0.285 

- .695 to 

- .396 

8 

11.03 

-3.03 

9.1809 

0.832 

- .796 to 

- .696 

15 

9.10 

5.90 

34.8100 

3.825 

— 1.000 to 

- .796 

7 

5.40 


2.5600 

0.474 

Total 


100 



xo 2 - 12.004 


d.f. =8 ;P > .14 


of n = 5, these values agree closely with the expected values of 0 and 
.500. At least for large samples, the normal curve might be used for the 
mathematical model when the true value of p = 0, against which the 
experimental results might be compared. An exact test, however, is 
available based on the ^-distribution as outlined above. In this case, 


= r v? . 

\/l — r 2 ’ 


/ = n - 2 


(3.16) 


This test is particularly useful for small samples. 

When the correlation in the population is not zero, that is, when p ^ 0, 
the sampling distribution of r is distributed about p with a st andard 
deviation, or standard error approximately equal to 1 — p 2 /y/n — 1. 
Whe n p = 0, this reduces to the standard deviation given above, or 
l/y/n — 1. With large samples and moderate or small values of p the 
sample value r may be substituted for the unavailable p, for example, 
1 — r 2 /y/n — 1 as a measure of the sampling error of r. With small 
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samples, however, the sample value, r, often differs greatly from the true 
value. Furthermore, the sampling distribution departs widely from 
normality so that the test of significance based upon the formula for large 
samples may be highly misleading. The constants of distribution of r 
for samples of n = 20 from a normal population as given in Table 17 
are illustrative of this point. 6 


TABLE 17 

Constants of Distribution of r in Samples (N = 20) from a Normal Population 


p 

0 

.2 

.4 

.6 

.8 

.9 

01 


0.066 

0.260 

.650 

1.400 

2.060 

02 

; EXjjtl 

2.820 

3.170 

3.910 

5.420 

6.870 

(fr 


0.221 

0.197 

0.154 

0.091 

0.049 

1 - p* 


0.220 

0.193 

0.147 

0.083 

0.044 

Vn - i 

• 







Fisher solved these and related problems by using the transformation 
z' = tanh^ 1 r 


(^) 


(3.17) 


= i log. 

«' is to a first approximation normally distributed about the population 

for all values of p with a standard deviation * /—— 

\ n — 


value £ + 


A/ i \ * VA iAiAA TUI1UVO VX U U1V1* Ml U VUI1AUUIX U. UO Y1UMUOU 4 I n 

2 (n — 1) \n — 3 

The form of the distribution of z' is nearly independent of the value of p 
in the population. The close approximation to normality of the z'-dis- 
tribution is noted from the constants of distribution of z' given in Table 
18. 


TABLE 18 

Constants of Distribution of z' in Samples (N = 20) from a Normal Population 


p 

Mean («' — £) 

<r*/ 

0i 

& 

0 

.0000 

.2423 

.0000 

3.116 

.2 

.0053 

.2422 

.0000 

3.117 

.6 

.0159 

.2412 

.0000 

3.118 

.9 

.0249 

.2398 

.0000 

3.114 


Other Uses of Statistical Models, We have now illustrated how the 
statistician builds up statistical or mathematical models against which 
experimental results may be checked with a view to examining their 
significance. In order for the reader to gain an insight into the process, a 


• See page 149 for criteria of normality. 
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series of empirical sampling experiments was presented. The three 
principal models illustrated were the normal, the t y and the chi-square. 
Certain other uses of one or another of these models and of models not 
previously illustrated for the research worker will now be considered. 

The Difference between Correlation Coefficients . The research worker is 
' frequently interested in comparing the relative intensity of relationships 
for different characters. Although an exact test of significance is not 
available for such purposes, a test based on Fisher’s ^'-transformation 
of the correlation coefficient is valuable and sufficiently accurate for most 
practical problems (Ref. 3). Let 


= i log* 


1 + ri 
1 - ri 


i log e 


1 + r 2 
1 - r 2 


where r x and r 2 are two correlation coefficients calculated from random 
samples of n\ and n 2 individuals, respectively. 


z\ — z' 2 varies normally about 


deviation A /— 5 H-— 

\n i — 3 n 2 — 3 

Therefore, the quantity 


p. (-J _ l —) 

2 \ni — 1 n 2 — 1/ 
(4=0) 


X = 



with standard 


(3.18) 


may be assumed to be normally distributed about zero with a standard 
deviation of unity when the true correlation coefficients in the sampled 
parent population are in fact equal. The sampling distribution, then, of 
X in repeated sampling may be assumed to be normal, and the experi¬ 
mental result, Xq , may be compared against the normal model. 

The Combination of Correlation Coefficients. The ^'-transformation is 
valuable for use in problems involving the averaging of several sample 
values of r from the same population in order to get the combined esti¬ 
mate of p. Thus the weighted arithmetical mean is 


./ __ (fti — 3)zi + (n 2 •— 3)z 2 + * • * 
( n\ — 3) + (n 2 — 3) + • • • 


(3.19) 


and the standard error of z' is 


_ 1 _ 

\/(tti “ 3) + (n 2 — 3) + • • • 


(3.20) 


The ratio Xq = z f /sy may then be referred to the normal model to deter¬ 
mine the probability that a value as great as or greater than Xq could be 
obtained in repeated sampling by random sampling errors alone. 
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(3.21) 


Correlations on the Same Sample . Comparisons are sometimes made 
among correlation coefficients based on the same sample. Hotelling 
(Ref. 4) has given the exact solution of the problem of testing the sig¬ 
nificance of the difference between r v i and r y2 under the conditions that 
the significance is to be interpreted with respect to subpopulations of 
possible samples for which predictors Xi and X 2 take the same set of 
values as those found in the obtained sample. Thus, F y or the variance 
ratio, is 

F = ( r vi ~ r v*) 2 (N ~~ 3)(1 + 7 * 12 ) 

2(1 - r\ 2 - r 2 x - r 2 2 + 2ri 2 r v ir, 
tti = 1; n 2 = JV — 3 

where r yl is the correlation coefficient between the predictor X x and the 
predictand y; r y2} between X 2 and y; and ri 2 is the correlation between 
Xi and X 2 . 

The assumption underlying the test is that (1) y has the univariate 
normal distribution for each set of values of Xi and X 2y independently 
for the different sets with (2) a common variance a 2 and (3) linear regres¬ 
sion of y on Xi and X 2) respectively. 

Hotelling also developed formulas for determining the selection of (a) 
one variate from among three or more and (b) additional variates when 
some have been chosen. His principal solutions of tests of significant 
differences among r v 1 , . . . , r yp are given in Ref. 4. 

Fisher’s ^-Distribution and the Related /^-Distribution. A mathe¬ 
matical model which has played an important role in modern statistical 
analysis is the 2 -distribution developed by Fisher. 

The quantity z is equal to one-half the difference of the natural 
logarithms of two independent estimates of the same population variance, 
or to the difference of the natural logarithms of the corresponding stand¬ 
ard deviations. This distribution serves as the model against which 
tests of significance of experimental results attained in the analysis of 
variance and in multiple regression problems (to be discussed later) are 
compared. 

Thus, suppose we have two samples of sizes, N 1 and N 2) each drawn 
at random from one of two populations of variates normally distributed 
with equal population variances <r 2 . 

Compute from the two samples 

Ni 

2 (x { - Xi)* 

— and 


o2 _ 1 

S 1 - 


n 1 


si 


Nt 

2 (Xi - x*y 


n 2 


where Jtx and X 2 are the respective means; s\ and si are the respective 
variance estimates; and ni = Ni — 1, n 2 = N 2 — 1. Then 

Si 


e = i log. ^ = log. 
s 2 


82 


(3.22) 
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2 is distributed in the form 

_ e" 1 ' 

y - y ° (me 2 ' + ni )»<».+«N) 


(3.23) 


where 2/0 may be taken such that the area of the curve is unity, and the 
experimental value, 2 0 , may be compared with the model to determine 
the probability that values of z equal to or greater than Zo could be 
obtained by random sampling errors alone. The probability P will be 
given by the area under the curve to the right of the ordinate erected at z 0 . 

Fisher (Ref. 3) has computed tables giving values of z corresponding 
to different values of rii and n 2 , and P, namely, the 5,1, and 0.10 per cent 
points of the ^-distribution. It should be pointed out that the table 
gives the values of z at which ordinates cut off “tails” of 5, 1, and .10 per 
cent of the total area of the curve for values of n 1 and n 2 chosen so that 
n 1 corresponds to the number of degrees of freedom associated with the 
larger of the two estimates of variance. 

The ^-distribution is un imodal a nd symmetrical if rii = For large 
values of tti and n 2 and also for moderate values when lil a^RtrT 2 are equal 
or nearly equal, the distribution of z becomes nearly normal about a 
mean of zero with a standard deviation, or standard error, 



It is to be noted that z is a Studentized function; hence it is especially 
appropriate for small samples. The 2 -test may be regarded as an 
extension of the £-test to situations where more than two variants are 
under comparison. In fact, Fisher (Ref. 2) has shown that the normal 
curve, the x 2 -distribution, and Student’s distribution are included as 
special cases of the two-parameter family of curves represented by the 
2 -distribution. For instance, since 2 = log* (t) f the values for n 1 ~ 1 in 
the table of 2 are the logarithms of the values for P = .05 and P = .01 in 
the table of t (Ref. 3). 

Tables of the variance ratio 

F = e 2 ‘ = | (3.24) 

are available (see Table IV, Appendix) and are coming to be more com¬ 
monly used than the table of 2 , since the troublesome logarithmic trans¬ 
formation is thereby avoided. Against this advantage perhaps is the 
advantage of greater accuracy in the use of the 2 -tables when interpola¬ 
tions are required. Tables for seven points of the F distribution are now 
available (Ref. 6). 

The Binomial Distribution in Sampling Theory. We have previously 
described the binomial distribution and indicated that the normal dis¬ 
tribution may be used as an approximation to it (see page 27). Since 
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this distribution plays such a significant role in sampling theory, it should 
be considered somewhat more broadly. 

A Sampling Experiment Leading to the Binomial Distribution . We 
begin by presenting the results of a simple sampling experiment consist¬ 
ing of the tossing of 10 coins 512 times. 

The record of this experiment is available in Table 19. The observed 
values for the several probabilities of success, that is, the proportion of tails, 
X py are given in column 2. The calculations for the mean and standard 
deviation of the number of successes are given in columns 3 and 4. It is 
found that the mean X = 0.5 and that the standard deviation s = .162. 
The corresponding theoretical values are 0.5 and .156, respectively. 


TABLE 19 

The Test of Goodness of Fit of the Theoretical Binomial Distribution for 
the Observed Distribution of Successes (the Proportion of Tails) from 
512 Tosses of 10 Coins at a Time 


X, 

/o 

fx 

/X 2 

ft 

/»-/. 

(/o -/<)’ 
ft 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

101 

0.9 / 

ih 



mm 

1.5 

0.4091 

. 

0.8 

15 


SBftJiI 


- 7.5 

2.5000 

0.7 

68 


33.32 

60.0 

8.0 

1.0667 

0.6 

105 


37.80 

105.0 

0.0 

0.0000 

0.5 

134 

67.0 

33.50 

126.0 

8.0 

0.5079 

0.4 

95 

38.0 

15.20 

105.0 

-10.0 

0.9524 

0.3 

55 

16.5 

4.95 

60.0 

- 5.0 

0.4167 

0.2 

23 

4.6 

0.92 

22.5 

0.5 

0.0111 

o.i 1 

8 lin 

0.8 

0.08 

6 0 Is 5 

4.5 

3.6818 

■HI 

2 J 10 

0.0 

0.00 

0.5 | 5 5 



Total 

512 

256.0 

141.42 

512.00 

Xo 2 

= 9.5457 


Sample values: 


x = m - 0.5 


d.f. - 8; .30 > P > 20 


Population values: 



141.42 

512 


- .25 


.162 


fi - 0.5 


<r 


5~>T5 

V 10 


.150 


In column 5 the theoretical values /* are given. Finally, we tested 
the agreement between the observed and theoretical values by means 
of the x 2 -test [column (7)]. We wish to test for goodness of fit and 
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enter the x 2 -table (Table III, Appendix) with x§ — 9.5457 and 8 degrees 
of freedom. It is found that P > .20. We conclude that the observed 
distribution may be regarded as in accordance with the binomial distribu¬ 
tion law; that is, the discrepancies between the observed and theoretical 
frequencies may be attributable to random sampling fluctuations. The 
theoretical basis of the binomial distribution is given below. 

The Binomial Expansion. Assume that we take N random samples 
each of size n and in each of which a specific number t possess a given 
character a and the remainder n — t do not possess the character. Let 

^ = p and 1 — - = g; then the frequencies of samples with t = 0, 1, 2, 

. . . , n are given by terms in the series N(q + p ) n ; that is, 


N 


q n 


+ nq n ~'p + 


n{n — 1 )p 2 q n ~ 2 

n~2 


+ 


n!pV~* 

t\(n~ t)\ + ' ’ ’ 


+ p n 




(3.25) 


The terms in the expansion (q + p) n are relative frequencies in the 
frequency distribution of all possible different samples, classified by 
number of successes, say t , that may be drawn from the population 
according to the rules of simple random sampling. The distribution 
may be called the sampling distribution of the number of successes, 
t = 0, 1, 2, • • • , t • • • , n. It is more commonly known as the bino¬ 
mial distribution , since it results from the expansion of (q + p) n . 

The mean of the distribution is given by 

M = np (3.26) 

and the variance 

<r 2 = npq (3.27) 


If instead of the actual number of Z’s in each sample the proportion 

of ^s, that is, -th of the number in each sample, is recorded, the mean 
n 

proportion of <’s would be 


and the variance 



(3.28N 

(3.29) 


The standard deviation of the sampling distribution or the standard 
error provides a basis for judging the exceptionalness of any obtained 
sample, as illustrated in the following example. 

Example 2. The Measurement of Exceptionalness. Assume that in 
a random sample of 50 individuals, 16 have a character, say A. Is this 
exceptional? It is known that in the general population 20 per cent 
possess the character A. 

Of a random sample of 50 individuals the exact proportion who 
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would be expected to have character A is given by the sum of the terms 
of the expansion of the binomial ($ + *) 60 from the seventeenth term 
onward. This proportion equals .031. This method, though exact, is 
extremely laborious. For this reason it is advantageous to use an alterna¬ 
tive method. 

If p = q = £ and n is large, the area under the appropriate section 
of a normal curve gives a close approximation to the point distribution 
of the binomial. Departures from the given conditions result in less 
accurate approximations. For example, if either p or q is small and n is 
not large, the approximation could be rather crude. A practical pro¬ 
cedure for determining the relative values of p and q for a given n if the 
normal curve may be expected to represent the binomial is the following: 

The mean of the distribution should be, say, three standard deviations 
from the start. Thus we want 

np > 3 y/npq 
n 2 p 2 > 9 npq 
np > 9(1 — p) 
p{n + 9) > 9 

»*h 

If, for example, n = 50, then 

p > TT? > -15 

In using the normal curve as an approximation, we proceed as follows 
in the problem worked out above by the binomial expansion. Calculate: 

X — np _ 15.5 ~ 10 _ 1 „ 

~V^pq 2.83 

According to the normal table, 

P = .025 

This value is compared with P = .031 above. 

The Sampling Distribution of Differences between Percentages. Fre¬ 
quently, the experimental results relate to the case of two samples where 
it is desired to know whether the two samples may be regarded as random 
samples from the same population. Thus: 

(1) In sample 1 of size m, there are t\ individuals that have the char¬ 
acter A. 

(2) In sample 2 of size U 2 } there are t 2 individuals that have the 
character A . 

Could the two samples be random samples from populations in which 
p (the probability of character A occurring) is the same? Thus: 

t\ _ ti 
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The theoretical or mathematical model against which such experi¬ 
mental results may be compared can be built up as follows: Assuming 
that pi = p 2 = p , the variation in t in repeated samples of rii follows the 
binomial (q + p) ni ; similarly, the variation in t 2 in repeated samples of n 2 
follows the binomial (q + p)” 2 . 

ti varies about a mean of rtip with a standard deviation, \/nipq; 
ti/ni varies about a mean of p with a standard deviation 


rti 




= 0 


That is, the mean in repeated sampling is p ; 


Similarly, 

Also, 

Consider: 


E \±-J = m. 

ini m 

£ [(£- p )(£- p )]-° 


pq 

ri2 


d = l±--h. 

ni n 2 


- [(£ - P ) - 2 (nr P ){^r P ) + { 


u 

n 2 


= VI + 

n i ^ n 2 


The ratio 


,3 - i) 


d 


h h 

U\ n 2 


yl pq (^ + i) 



(3.30) 

(3.31) 


(3.32) 


in repeated sampling will be approximately normally distributed aboutjO 
wit h unit standard deviation. The normal model may therefore be used 
for comparing the experimental results. The complete procedure is 


1. Assume the hypothesis pi = P 2 = p. 

2. Estimate p from the data; the maximum likelihood estimate is 


P = 


1 1 + <2 
n\ + n 2 
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t\ <2 


3, 


4. 


Calculate the ratio 



Refer to the normal probability scale; consider whether the results 
are compatible with the hypothesis. 


Problems 

The following problems are designed to give the student an under¬ 
standing of sampling and sampling errors. The following normal popu¬ 
lation of numbers with a mean of 30 and a variance of 100 may be used 
for the exercises. Write or type each of the 100 numbers on a small 
square of cardboard or stiff paper. Place the 100 pieces in a box and mix 
thoroughly. Then draw one card at random from the box and record the 
number on it. Return the card to the box. Mix the cards again, draw 
a second card and record its number, and so on until a sample of a speci¬ 
fied size has been obtained. 


Frequency Distribution of Numbers, X’b, in a Normal Population with j* = 30; 

<r a = 100 


X 

/ 

X 

/ 

X 

/ 

X 

/ 

57 

1 

39 

3 

28 

3 

15 

1 

53 

1 

38 

2 

27 

3 

14 

1 

49 

1 

37 

2 

26 

4 

13 

1 

48 

1 

36 

3 

25 

3 

12 

1 

47 

1 

35 

3 

24 

3 

11 

1 

46 

1 

34 

3 

23 

2 

7 

1 

45 

1 

33 

5 

22 

2 

3 

1 

44 

1 

32 

2 

21 

3 



43 

2 

31 

4 


2 



42 

3 

30 

10 

19 

3 



41 

3 

29 

4 

18 

3 



40 

2 



17 

2 







16 

1 




Total 100 


Exercise: Selecting 20 samples of 10 at random, 

1. Compute 20 means. 

2 . Compute 20 variances. 

3 . Combine to make 10 random sets of paired values of the means; of the 
variances. 

4 . Compute 10£’s for differences between means of uncorrelated measures. 

5. Take 10 samples of 5 in pairs and calculate the correlation coefficients. 
(5. Combine the results of the individual students in each case, form the 

frequency distribution of the statistic, and plot the histogram. Calcu- 
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late the mean and standard deviation of each distribution and compare 
with the population and expected values. 
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CHAPTER IV 

THE TESTING OF STATISTICAL HYPOTHESES 

The Role of the Hypothesis in Scientific Investigations. In the well- 
developed empirical sciences, scientific procedure is primarily concerned 
in deriving predictions the validity of which is tested by the results of 
experiments. The modern development of science has been much 
facilitated by the practice of using hypotheses in planning and guiding 
scientific inquiries. Even a casual study of the investigations of eminent 
scientists reveals that they were guided in their work by some theory and 
that this theory guided their observations and experiments. Where the' 
theory proved inadequate, it was modified. Occasionally it was aban¬ 
doned completely, but then another was sought to plan the action. 
Significant experimentation requires the guidance of a hypothesis, and a 
successful experimenter does not collect observations unguided by theory 
to which the facts are related. Bacon maintained that if enough instances 
are gathered and tabulated correctly, the principle which explains them 
will simply emerge without any hypothesis about them having been 
formed. This contention has not been proved by the experimenter. 

The working hypothesis plays an important part in statistical research. 
It serves as a guide in planning the investigation; in determining what 
data to collect; in classifying, ordering, and reducing them; and finally 
as the basis for formulating the judgments with respect to it. 

Similar to the working plan of Newton, the scientist who did not 
formulate hypotheses needlessly (“hypotheses non jingo”), or to the meta¬ 
physical requirements of Ockham, the logician who considered it needless 
to recur to many entities when it was possible to get along with fewer 
ones (“nunquam ponenda est pluralitas sine necessitate”), the statistician’s 
preferred method is to test the simplest hypothesis and to hold to a 
minimum number of new quantities or constructs. Thus the preferred 
hypothesis used by the statistician in the examination of his data is that 
the apparent variations and the estimates of presumed effects may be 
attributable to random sample errors or to fortuitous factors rather than 
to the action of new causes. This hypothesis can be tested by the 
application of the theory of errors. It will be recalled that the statistical 
models previously described were constructed on the basis of sampling 
errors. As long as experimental results conform to these models, the 
hypothesis of chance (or, more specifically in this case, sampling errors) 
being the cause of the observed effects is accepted. 

The hypothesis that chance factors may have given rise to an observed 
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effect is frequently spoken of as the null hypothesis . This hypothesis is 
met with in a number of different forms in research or statistical work. 
In experimentation, for example, it is often desirable to compare the 
effects of various methods of treatment or of production. The null 
hypothesis can in these cases be stated as follows: There is no difference 
in the outcomes of the several treatments, or, The outcomes are the same. 
This method is equivalent to determining whether or not the observed 
difference should be ascribed to random fluctuations or judged to be 
significant, that is, ascribed to the differential treatment. The null 
hypothesis assumes the former alternative, which, if found to be incom¬ 
patible with the facts of observation, is then rejected. More generally, 
the null hypothesis may be stated thus: Can the samples under examina¬ 
tion be regarded as having been randomly chosen from the same or sim¬ 
ilar population X/ 

General Theory of Testing Statistical Hypothesis. In Chapter III, 
Sampling Distributions, it was stated that the statistician developed 
mathematical models against which the research worker could compare 
his experimental results and draw conclusions with respect to their 
significance. The process of determining statistical significance was said 
to consist in comparing the numerical data (or some function of them) 
obtained in a particular experiment with the model to establish whether 
or not they conform to the model. The name applied to the process of 
examining the significance of the data is the test of significance. In 
dealing with the sampling distribution we were concerned with testing 
the agreement between the distribution of our set of sample values and a 
theoretical distribution. In this case, we spoke of a test of the goodness 
of fit. 

More recently, we have come to speak of the problem of testing 
statistical hypotheses and thus to speak of the test of significance relative 
to the hypothesis in question. Before proceeding to illustrate the appli¬ 
cation of these tests to some practical problems met with by the research 
worker, we shall describe briefly the theoretical basis underlying current 
procedures in testing a statistical hypothesis. 

Suppose that a random variable X is the measurement of a certain 
character and that a number of repeated measurements are made> say 
N times. We thus obtain N random variables X h X 2 , . . . , X N . The 
N random variables are assumed to be independently distributed, and 
the set of values is said to be a sample of N independent observations on 
X. The sample of N observations may be represented as a sample point 
E, in the iV-dimensional space having as its coordinates Xi,X 2i . . . , Xn. 
The space in which the point lies may be called the sample space f W. 

Assume that the distribution of X is normal but that the values of 
some parameters 0i, . . . , d q specifying the population are unknown. 
Any assumption about the unknown parameters 0i, . . . , d q may be 
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called a statistical hypothesis. The statistical hypothesis, Hi, is called a 
simple hypothesis if it determines completely the values of all the g-param- 
eters, for example, if it specifies that 0\ = 1 , 9a = 3, • • • . If the 
hypothesis is consistent with more values than one for some parameter 
it is called a composite hypothesis; for instance, the hypothesis that Oi = da 
for a distribution of X determined by two unknown parameters is a 
composite hypothesis. 

For simplicity, we shall consider the case of a single unknown param¬ 
eter. That is, let us assume that only one unknown parameter, 9, is 
involved in the distribution function of X and 9, or F(X i, Xa, • • • , X„, 
8). We wish to test the null hypothesis, Ho:8 = 8 0 against the only 
admissible alternative hypothesis, Hi:8 = 8 1 . For example, we may 
test the significance of the deviation in the mean of a sample on the 
basis of a random sample of N independent observations X h ... , X N 
from a normal population X. Then H 0 is the hypothesis that X is 
normally distributed about the mean 8o with standard deviation, <r, and 
Hi is the hypothesis that X is normally distributed about the mean 0i 
with standard deviation, a. 

The testing of the statistical hypothesis involves the choice of a 
region , w, called critical in the sample space W. It also involves the 
decision to reject the hypothesis if and only if the sample point E falls 
in w. Therefore, the test of the statistical hypothesis, H 0 , consists in 
rejecting H 0 , when the sample point, E , falls within a specified critical 
region, Wo, and in accepting H 0 (or at least not rejecting it) if the point 
falls without w Q . The fundamental problem is, therefore, the specifica¬ 
tion of the critical region, w 0 . 

The principle upon which the choice of the critical region depends 
was first advanced by Neyman and Pearson (Ref. 3). It is based on the 
control and minimizing of two kinds of error involved in testing the 
hypothesis, H 0 : (1) the unjust rejection of the hypothesis, described as 
an error of the first kind, and (2) the failure to reject the hypothesis when, 
in fact, it is incorrect, that is, when some other hypothesis, Hi, is true, 
designated as an error of the second kind. 

The probability of an error of the first kind determined by the 
hypothesis under test, say Ho, is called the size of the corresponding 
critical region, Wo, and is given by P{EeWo\H 0 }, that is, the probability 
that E, as determined by the observational values will fall within the 
region, Wo, as determined by the hypothesis, H 0 . This probability may 
be designated by a. 

The probability of an error of the second kind is P{Ee(W — Wo)\Hi] 
where ( W — Wo) is the set of all sample points outside w 0 . It may be 
specified as 0. This probability is called the power of the test with 
respect to Hi. 

Neyman and Pearson (Ref. 4), assuming that P(X i, . . . , X n \H 0 ) 
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and P(X i, . . . , X n \Hi) are the probability laws of the X’s as fixed 
by the hypothesis, H 0 , being tested, and by II h a single alternative, have 
shown that the region w 0 , established by the inequality 

P(X lf • • • , X n \H l ) > kP(X h • • , X n \Ho) (4.01) 

when k > 0 is a constant selected such that 

P{E€Wq\Hq] = a (4.02) 

is the best critical region with regard to Hi having size a. The critical 
region, which provides the most powerful test with respect to H i, is called 
the best critical region for Ho with respect to II i. / 

This theory of testing statistical hypotheses is based on the simple 
principle of arranging the test, that is, of choosing the critical region, w Q , 
so as to minimize the probability of errors of the second kind while keep¬ 
ing the probability of errors of the first kind constant. The size of the 
critical region is then determined by a and its power is designated as 
1 — £. It is obviously impossible to make both a and arbitrarily small. 
The decision of just how the glance between the two kinds of errors 
should be struck must be made oy the investigator and will presumably 
be based on the relative impor^nce of the two kinds of error in the 
particular situation. It is the function of statistical theory to show how 
the two risks of error may be controlled and minimized. 

In practice, the investigator controls the first kind of error by choosing 
as the value of a, the boundary of the critical region, a specified level of 
significance, say the 5 per cent, 1 per cent, or 0.1 per cent point value 
of the criterion. The level is decided upon at the time of designing the 
investigation and depends on the nature of the problem and the risk in 
error the investigator is willing to accept. The custom is to reject the 
hypothesis tested if the observed value of the criterion is greater than 
(lies beyond, usually) the 1 per cent point, to remain in doubt if it lies 
between the 5 per cent and 1 per cent points, and to accept the hypothesis 
if the criterion is less than the 5 per cent point. With respect to the 
control of the second type of error, studies of the power function of tests 
have been made and tables are available for securing the probability of 
errors of the second kind in some instances. Neyman and Tokarska 
(Ref. 5) have compiled tables for use in determining the probability of 
errors of the second kind in testing Student’s hypotheses. Tang (Ref. 6) 
has tabled the power function for the test of general linear hypotheses, 
which reduces to Fisher’s z-test. Lehmer (Ref. 2) has prepared further 
tables for detecting the probability of errors of the second kind in dealing 
with linear hypotheses. Eisenhart (Ref. 1) investigated the power 
function of the x 2 -test. 

The relation between the probabilities of the two kinds of error 
involved in testing the hypothesis, Hq:0 ~ $ 0 , against the alternative, 
Hi:8 = Si, is illustrated in Fig. 4. 
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The probability of accepting the hypothesis, H o:d = do when it is 
true, is given by 1 — a. That is, the critical region, w 0 , is the area to the 
right of the ordinate erected at X = X 0 in the da-curve; the probability 
of accepting the hypothesis, H x :6 = di when it is true, is given by 0, the 
area under the di-curve which lies to the right of the ordinate at X = AY 
The quantity 0 relative to do, di, and a as defined previously is the 
power of the test which specifies w 0 as the critical region. Hence, a 
and (1-/3) represent the probabilities of the first and second kinds of error, 
respectively. 



Figure 4. Normal distributions of the univariates p(x,0,) and p(x,0 o ) with critical 
regions for testing alternative hypotheses relative to the mean. 

Neyman and Pearson use a criterion based on the principle of likeli¬ 
hood as the basis for accepting or rejecting a given hypothesis. In the 
case of the hypothesis tested above, Ih, the ratio 


Po(x h X,, • • • , X n ) 
Pi(X 1} X 2 , • • • , X n ) 


(4.03) 


is designated as the likelihood of the hypothesis, II o, as tested against the 
single alternative hypothesis, H i. 

In accordance with Equation (4.03), a most powerful region, a, is 
comprised of all points which satisfy the inequality 


P(X U • • • , X„1HQ 
P(X i, • • • , X n \H 0 ) “ K 


(4.04) 


where k is selected so that the region should have the required size a 
as indicated in Equation (4.02). For example, the principle for choosing 
the critical region, Wo, may be applied to the case of testing the significance 
of a mean of a sample from a normal population, where Ho:d = d 0 and 
H\\0 — di. We specify that the critical region required and defined by 
the inequality [Equation (4.04)] has the size a — .01. 

Since, under the hypothesis H 0 , the variate 


J. 

. N 



1 , * 


,*) 


(4.05) 


is normally distributed about a mean of zero with variance l/N, k can 
be read from a normal table: 



Chap. IV] TESTING OF STATISTICAL HYPOTHESIS 


67 


= 2,326 

VN 


The most powerful region of size .01 is then 


l 


(X a - do) > 


2.326 

Vn 


(4.06) 

(4.07) 


that is, the test specified by the region (4.07) is most powerful with regard 
to all alternatives, 6 > 0 O . 

If the probability of an error of the first kind, a, and of the second 
kind, p, is specified in a given problem, it is possible to determine the 
minimum size of sample, N, for which the power of the most powerful 
region of size a is equal to or greater than 1 — /3. For testing H 0 against 
Hi, for instance, the minimum number of observations is equal to the 
smallest positive integer, N , for which 

Pn(*) < J8 (4.08) 

where Pn(<x) denotes that for a fixed N, ft is a single-valued function of a. 
For example, (1) if the arithmetic mean, X, of a predetermined number 
of N observations is less than or equal to a properly selected constant, k, 
the hypothesis being tested, Ho, is accepted; and (2) if X > k, the hypoth¬ 
esis, H 0 , is rejected. N and k are determined such that the probability 
of (1) is equal to 1 — a when 6 ~ do and is equal to P when 6 = 0\. 

Sequential Test of a Statistical Hypothesis. Recently a test has been 
developed whereby the number of observations is not predetermined but 
is kept as a random variable. Instead of deciding in advance the number 
of items to be included in a sample, the data are analyzed continuously 
as they are being collected (Ref. 7). In such cases where it is possible 
to examine the data as they originate, as in some manufactured products, 
the sequential probability-ratio test frequently uses half as many observa¬ 
tions as the current most powerful test. Briefly, the principal properties 
of the sequential test are as follows: 

(1) The procedure by which a sequential test of a statistical hypothe¬ 
sis is carried out depends on the following rule of behavior: 

(a) To accept the null hypothesis being tested. 

(b) To reject the hypothesis. 

(c) To suspend judgment, that is, to continue the analysis by making 
an additional observation. 


The test procedure is kept up sequentially until either decision (a) or (b) 
is made. 

(2) If a is the probability that when H 0 is true, the alternative 
hypothesis, Hi, will erroneously be accepted, and if p is the probability 
that whe& Hi is true, H 0 will falsely be accepted, then it is necessary that 
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a 4* |8 < 1. Sequential analysis determines in the course of the analysis 
whether or not the data justify a decision with a risk in error of judgment 
as small as a or fi. The number of observations necessary will, on the 
average, depend on how small a and 0 are made; also on how fine a dis¬ 
tinction is made between Ho and Hi. 

(3) The fundamental criterion basic to the decision in (1) is the likeli¬ 
hood ratio, L, which is the ratio of the probability that the one hypothesis 
truthfully specifies the origin of the observed data to the probability that 
the alternative hypothesis does. The value of L required to accept H 0 

is —that required to accept Hi is L is computed after each 

observation and is compared with the critical values necessary for a decision. 
These values of L are independent of the number of observations. Since 
the likelihood ratio, as used in sequential tests, is a continuing product, 
considerable saving in calculation results by using log L instead of L. . 

In practice, the quantities a and 0 are usually taken as quite small, 
rarely greater than .05 and frequently .01 or less. 
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CHAPTER V 


CURRENT PROCEDURES IN TESTING STATISTICAL 
HYPOTHESES 

Up to this point, we have defined a number of statistical models 
against which the research worker may compare his experimental results. 
We have also discussed the theoretical formulation and solution of the 
problem of testing statistical hypotheses. It is now the purpose to show 
how, in a given situation when faced with some practical problem, the 
research worker may utilize the principles underlying the theory in decid¬ 
ing which model, if any, is applicable in his particular problem, and how 
to choose it intelligently and effectively. This chapter will be devoted 
to illustrating ways of solving a number of problems most of which are of 
frequent occurrence. 

Problem V.l. The significance of a mean from a known normal 
population. The simplest case of testing significance is in the problem 
where the population is known, that is, the population parameters, the 
mean and standard deviation, are known and the quantity whose sig¬ 
nificance we are interested in testing may be assumed to be normally 
distributed in the population. Specifically, the question is: Could this 
sample be a random sample from that population? Such, for instance, is 
the problem of determining whether or not a given sample of pupils to 
whom an intelligence test has been given could be regarded as a random 
sample from the population upon whom the norms of the test were set 
by the author. 

Assume that it is known that for a particular intelligence test the 
I.Q.’s are normally distributed about a mean of 100 with a standard 
deviation of 17 I.Q. points. The test is administered to a class of 36 
pupils who in other respects appeared to belong to this population. The 
mean I.Q. for the class was found to be 108. May we conclude that the 
class is a random sample from the specified population—that the mean 
ability of the class is the same as that of the population? , 

To answer this question we need to determine what model should be 
used with which the experimental result can be compared. It is known 
from sampling theory (page 36) that the means of samples of 36 cases 
drawn at random from this population will be normally distributed about 
the population mean, 100 I.Q. points, with a standard deviation (or 
standard error) equal to <r/y/N = -^ = 2£ I.Q. points. We found a 
mean of 108 I.Q. points. How often should we expect to find a mean as 
high as this or higher in repeated sampling from this population? 
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The answer is obtained by referring to the normal probability table 
(Table I, Appendix). To enter this table we must convert the raw score 
to a standard measure. Thus: 


z 


108 - 100 
2.833 


= 2.82 


From the table, we find that in repeated sampling from this population 
we should expect to find a value as high as or higher than the one obtained 
in 1 — .9976, or 0.24 per cent, of cases. This probability is lower than 
the level of 1 per cent which we decided to use. Therefore, we conclude 
that the sample could not have been drawn from the specified population. 
We are aware that in making this statement we shall be wrong in 0.24 per 
cent of the cases; but this is a risk we are willing to run. Such is the 
statistical conclusion. The education conclusion is that the mean 
ability of the class tested is significantly above the norm specified for the 
population. 

Problem V.2. The significance of a mean from an unknown normal 
population. We shall take next the problem in which the population 
mean is known, or specified by hypothesis to be some value, say 100 I.Q. 
points, but in which the population standard deviation is not known. 

We gave an intelligence test to a class in Grade 6 consisting of 26 
pupils. The mean I.Q.-score on the test was 93. We want to know if 
our class may be assumed to be a random sample from a population whose 
mean, n, equals 100 I.Q. points. To answer this question we need to 
compare our result with the appropriate model, which in this case must 
be the distribution of t, (see page 43), since we do not know the popula¬ 
tion standard deviation. Therefore, we calculate the value of t, say to, 
for our sample and compare it with the t-model. If we find that the 
probability of getting a value of t greater than or equal to ± to in repeated 
sampling is less than 1 in 100, then we conclude that the sample could 
not have been drawn from this population. 

The necessary calculations and procedures are as follows: 


N = 26; 
The value of t 0 is 



to = 


(* -m) 


s_ 

N 


-7 

2.363 


2(X - X) 2 
(N — 1) 


= 144 


-2.97 


We compare this value of to with the model as given in the table of the 
^-distribution (Table II, Appendix). We enter the row of the table 
corresponding to n = N — 1, that is, n — 25 in our example. For 
samples of 26 (» = 26), we expect to find values of t greater than or equal 
to ±2.787 in 1 per cent of cases; so, clearly, we should expect to find 
values greater than or equal to ± to — ±2.97 in repeated sampling from 
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this population in an even smaller percentage of cases. Our conclusion, 
therefore, is that it is unlikely that our sample was drawn from a popula¬ 
tion in which the mean I.Q. was 100, or, that the mean ability of the class 
is significantly different from the norm of 100. 

Problem V.3. The significance of a mean from a small finite popula¬ 
tion. In most sampling problems a large population exists or is assumed 
to exist. At times the problem arises of using a sample which may 
comprise an appreciable part of a relatively small finite population 
sampled. The standard error of the mean is then adjusted as follows: 


<r IN -n 

V^nn-i 


(5.01) 


This adjustment follows from the fact that sampling errors affect only 
the estimate of that fraction of the whole which is not included in the 
sample. The value of a, the population standard deviation, is usually 
unknown, but the unbiased estimate of it can be obtained from the 
sample. N is the size of the population; n, the number of sampling units. 

For example, suppose a sample of 50 female students has been drawn 
at random from the 500 female freshmen enrolled in a university. We 
wish to test the hypothesis that the mean height of the 500 freshmen is 
equal to 168 cm. 

We calculated the following statistics for the sample of 50: 

164.8 cm 
5.9 cm 

5.9 1 500 - 50 

V50 \ 499 

.79 


_ 164.8 - 168 _ 
.79 


X = 
s x = 

Then ss = 


We compare the value to with the f-model. Entering the table of the 
^-distribution (Table II, Appendix) withn = N — 1, or 49, we find that for 
n — 40, f .0006 = —3.551 and that for n = 60, —t.oooo = —3.460. 
Since our value is obviously greater than the tabled values, we may reject 
the hypothesis that the mean height of the 500 freshmen is equal to 168 
cm. If the statement that n = 168 were true, we should expect that in 
repeated sampling, 50 students selected at random from the 500 would 
give a mean as divergent as 164.8 less than once in 2000 trials. 

Problem V.4. The significance of the difference between means. 
More frequently the problem is that of testing whether or not there is a 
significant difference between means, that is, whether or not the samples 
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may be regarded as random samples from the same normal population; 
to test the hypothesis that the true difference between means is zero. 

The experimental results may also in this case be compared with the 
model ^-distribution in making the test of significance (see page 47), 
because (1) the difference between two means may be regarded as nor¬ 
mally distributed about zero (if the hypothesis is true) with a standard 
deviation <r, and (2) the standard error of the difference estimated on the 
number of degrees of freedom provides an independent estimate of <r. 
Since t f in general, is the ratio of (1) to (2), it is the appropriate criterion 
for the test of the hypothesis involved here. 

The following calculations and procedures enable us to make the 
determination of the to in this particular case (the subscripts refer to the 
corresponding sample) : 

v _ 2Xi. v _ SZ 2 
Al Ni ' N 2 

, „ 2(Xx - XO 2 + Z(X 2 - X 2 ) 2 
Ni + N 2 - 2 
* 0 = (*i - xj 

Refer to to the table of t (Table II, Appendix). 

Let us illustrate by taking the problem to test if two sets A and B 
of test scores from two classes in algebra may be regarded to have come 
from the same normal population. We obtain the following values: 


Class A 
Ni = 34 
SZ X = 975 
Xi = 28.68 

2(X x - Xi) 2 = 4327.4 

to = 


S(X 2 
*1 - Z 2 


Class B 
Ni = 30 
SZ 2 = 795 
Xi = 26.50 
X 2 ) 2 = 2969.5 


/2(Z X - Xi) 2 + S(Z 2 - XiY 

N Nt + Ni - 2 

_ 28.68 - 26.50 

/4327.4 + 2969.5 / 1 1 \ 

\ 34 + 30 - 2 \34 ' 30/ 


(M + r 2 ) 


34 + 30 
2.18 

VI 17.69 X .062745 


= .802 


We enter the f-table in the row corresponding to n = iVi + iV 2 — 2, 
to find the probability of obtaining a value of t greater than or equal to 
±fo in repeated sampling. In the example, n = 34 + 30 — 2 = 62, but 
this specific value is not given in the table. It is observed from the values 
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for n = 60 and n = 120 that the probability of getting a value of t 
greater than or equal to ±0.802 in repeated sampling is somewhere 
between .40 and .50. We conclude, therefore, that the two classes may 
be assumed to be random samples from the same normal population or, in 
other words, that the means of the two classes are not significantly differ¬ 
ent. The pedagogical conclusion is that there is no real difference 
between the average algebraic abilities of the two classes as measured by 
the test used. 

The hypothesis tested above, that the two samples were random 
samples from the same normal population, is equivalent to testing the 
hypothesis, H x , that hi = hi and af = a \, against the set of all the 
alternative hypotheses which specify only that hi ** M 2 or a\ ^ *l or 
both. General results have been obtained by Sato (Ref. 13, page 1) to 
indicate that the £-test is also the uniformly most powerful of all the 
unbiased exact tests that can possibly be made for the hypothesis, H 2 : 
In connection with two uncorrelated normal populations, in and i r 2 , it is 
assumed as given that <t\ and cr 2 have the same (though unknown) value, 
to test the hypothesis, that hi = M 2 , against the set of alternatives that 
Hi 5 * H 2 - The study of the power function of the t-test under different 
conditions has been made by Hsu (Ref. 13). 

We shall next consider a practical problem which occasionally arises 
when there is evidence to indicate that the variances of the two pop¬ 
ulations from which random samples have been drawn are unequal and it is 
desired to test the significance of the difference between the means. 

Problem V.6. The significance of the difference between means 
when the variances are unequal or unknown. For a precise test of sig¬ 
nificance first given by Behrens (Ref. 3) for the difference between the 
means of two samples supposedly not drawn from equally variable popu¬ 
lations, or from populations having a known variance ratio, the Behrens- 
Fisher method is available (Refs. 7, 8, 9, 22). Its application is made in 
the following example. 

At the end of a certain course in science, two groups, one in U High 
School and one in B High School, took the Peterson Comprehensive 
Science Examination (Ref. 18). The following results were recorded: 

School U: Ni = 14 X x = 73.21, (S.D.) X = 21.53 

or S(Xi - Xi) 2 = 6489.5726 

School B: N 2 = 12 X 2 = 56.30, (S.D.) 2 = 16.75 

or S(X 2 - X 2 ) 2 = 3366.7500 

From these data we obtain: 

(a) F = 1.6314, wi = 13, n 2 = 11: .30 > P > .20 

(not significant) 
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_ - Xt _ 

fs(Y, - x,y , i(x, - IJ 2 
Ni(Ni - 1) + N 2 (N 2 - 1) 


= 2.162 


(5.02) 


(c) For 

tti = 13, n 2 = 11, mean-variance ratio = 1.6314 X f! 
0 = tan- 1 V1.6314 X H = tan" 1 (1.1825) = 49°47' 


We enter Sukhatme’s table of the d-function (Ref. 22). We have 
n i = 13, n 2 = 11, 0 = 49°47'. No d.o& (or d.oi) is given for these values. 
So we must find d .05 to fit these values. We may interpolate for either 
wi or n 2 first; the result will be the same result in either case. Here we 
shall interpolate for n x first. For n x = 13 we get the following d.os 
values: 


e 

0° 

15° 

o 

O 

CO 

45° 

60° 

75° 

90° 

n 2 = 8 

2.306 

2.293 

2.261 

2.225 

2.196 

2.176 

2.170 

ni — 12 

2.179 

2.175 

2.167 

2.163 

2.164 

2.168 

2.170 

Now we interpolate for n 2 = 

11 and get the following d. 0 & values: 




n x = 

= 13, 

n 2 = 11 




e 

0° 

15° 

30° 

45° 

60° 

75° 

90° 

di 

2.190 

2.185 

2.175 

2.169 

2.167 

2.169 

2.170 


For ni = 13, n 2 = 11, 0 = 45°: d. os = 2.169 
For ni = 13, n 2 = 11, Q = 60°: d. 06 = 2.167 

Since our observed d 0 = 2.162 is less than either of these d. 0 6 values and 
since our value of 0, 49°47 # is between 45° and 60°, there is no need for 9 
interpolating for 0. We now may declare our observed value of d non¬ 
significant at the 5 per cent level. 

It is worth noting that if we had used the usual £-test for the hypothe¬ 
sis of equal means, the hypothesis would have been rejected at the 5 per 
cent level. Thus: 

t = — . --^ ~ == — = 2.121 (5.03) 

/s(Yi - £i)* + S(Z 2 - X 2 ) a /Ni + N 2 \ 

\ Ni + Ni — 2 \ NiN 2 ) 

For n = Ni + iV 2 — 2 = 24, P < .05 


An approximate method for the same problem was proposed by 
Cochran and Cox (Ref. 21), a method to test the hypothesis of equality 
of means with no hypothesis about the population variance when Ni?* N 2 
and si 7** $ 2 . In this test the variance of each mean is calculated sep¬ 
arately. A criterion t is obtained by computing a weighted mean of the 


two ^-values for the two samples, the weights being the two variances of 

_ jj' 

the respective means. The ratio —- - is then compared with the 


weighted lvalue to judge the significance. 


The approximate test has been applied to the same data analyzed 
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above by the Behrens-Fisher formula. The calculations are set forth 
in Table 20. 

TABLE 20 

Calculations for the Cochran-Cox Method of Testing the Significance of the 
Hypothesis of Equality of Means with No Hypothesis about the Population 

Variance 



The criterion (weighted t) 


t. Q5^i 2 + l.osSffa 2 
sg i 2 + Sx* 

77.0191 + 56.1380 
35.6520 + 25.5057 
2.177 


The observed t is calculated as follows: 


Xi - X 2 _ 73.21 - 56.30 

” V35.6570 + 25.5057 


2.162 


Since the observed t is less than the criterion t ) that is, since 2.162 < 2.177, 
the hypothesis of equal means is not rejected. Thus, the same conclusion 
is found as in the case of the exact test provided by the Behrens-Fisher 
formula. 


Where the sizes of the samples are the same, that is, where Ni = N 2 , 
the significance of the difference between the means can be determined, 
even though the variances differ, by calculating the value of t in the usual 
way applying Formula (5.03). However, the stable is entered with 
d.f. = Ni — 1(= N 2 — 1) instead of Ni + iV 2 — 2. 

Problem V.6. The significance of the difference between the means 
of correlated measures. Situations arise in which the two samples are 
equal in number and in which each individual of one sample corresponds 
in some way to a particular individual of the second sample. Such is the 
case, for example, when individuals have been paired or equated on 
certain characteristics, in two different groups. One group is then 
subjected to one type of treatment and the second to another. At the 
end of the experimental period, evidence is obtained as to whether a 
differential effect has resulted. In this case and in others of a like kind, 
we can use the distribution of t as the theoretical model. It is necessary, 
however, to calculate to in a way different from that illustrated in Problem 
























76 


PROCEDURES IN TESTING HYPOTHESES [Chap. V 


V.4. As before, we wish to determine whether the two groups may be 
regarded as random samples from the same normal population or to test 
the null hypothesis that the two means are the same. The individuals in 
the two groups have been equated, a fact that must be taken into account 
when setting up the model. If there is no differential effect, then clearly 
the difference between the criterion measures of the paired individuals 
should be zero. 


In practice, as has been noted, in taking means of samples from the 
same population, the differences will never be exactly zero, even if there 
is no differential effect. The distribution of differences between cor¬ 
responding values now constitutes a single sample, for which the mean 
difference and the standard error of the mean difference can be calculated 
in the usual manner. The ratio of the mean difference to its standard 
error will be distributed as t in repeated sampling. Therefore, the 
distribution of t is the theoretical model against which to check the 
experimental results. 

The following data were obtained in an experiment to compare the 
efficacy of two methods of teaching elementary algebra to high-school 
classes. One group was taught by the group method, the other by the 
individual method. The individuals constituting each of the 25 pairs 
were equated on the basis of intelligence test scores and mathematical 
pretests. 

Our problem is to test the null hypothesis that there is no difference 
between the two teaching methods with respect to the outcomes meas¬ 
ured. This is equivalent to determining from the experimental data 
whether the mean scores on the criterion of the two groups are the same, 
that is, whether the two classes may be assumed to be random samples 
from the same normal population. If it is found that the mean scores 
are significantly different, the conclusion will be drawn that there is evi¬ 
dence of a differential effect between the two methods of teaching. 1 

The data are recorded in Table 21. We first calculate the differences 
between the scores made by the paired individuals. These differences 
are given in column (4). We then find the mean of the distribution of 
differences: 


Mean differences, 


D = X D = 


= 9.28 


We next calculate the variance of the differences: 


_ NXD 2 - (2Z>) 2 
N(N - 1) 

_ 25 X 8962 - (232) 2 
(26) (24) 

= 283.71 


1 For a rigorous discussion of the single-factor experiment, see page 286. 
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TABLE 21 

Calculations for Tests of Significance of Difference in Paired Groups by 

Two Methods 


XVI 

XVII 

XVIII 

XIX 

XX 

XXI 

XXII 

XXIII 

XXIV 
XXV 
Total 



r-f-36.40 


_1 QQ 

Check: X - —^=~ + 50 = 45.68 
tO 

— 340 

Check: ? = -==- + 50 = 36.40 

to 


Check: X - ? = 45.68 - 36.40 - 9.28 
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TABLE 21 (i Continued) 

Method 1 Method 2 

IN 2D* - (2D)* 
a sr ~ CD - yj N*(N - 1) 

\a x 2 . <ty 2 0 *x<ty 

*X-r - yj N + N 2 rxr N 

/25 X 8962 - (232)» 

fN 2* 1 + N2y* - 2N Xxy 

“ yj 25 2 .24 

~ yj n*(n - i) 

- 3.368 

NXx* = N?X* - (2X)* - (25)(12,526) 

- ( —108) 1 = 301,486 
iVSy 2 = N2Y* - (SF) a = (25)(11,974) 

9.28 

to ” 3.37 

- (-340)* - 183,750 

- 2.754 

2NXxy - 2(N)(2XY) - (XX)(SY) 

3 

n 

tH 

1 

* 

II 

£ 

- 2(25)(7769) - (-108)(-340) 

P < .05 

= 2(157,505) 

or .02 > P > .01 

= 315,010 

JV*(JV - 1) = (25) 1 (24) - 15,000 

/301,486 + 183,750 - 315,010 
<rx-r - yj 15,000 

= ■%/11.3484 = 3.368 

9.28 
to ~ 3.37 
- 2.754 

n = JV - 1 = 24 

P < .05; .02 > P > .01 

t, 02 = 2.492 j t, ox 58 2.797 


The variance of the mean is then 



283.71 

25 


11.348 


The standard error of the mean is 


85 = Vll-348 = 3.37 
In one operation, the calculation of sr> is 


Then 


S5 


-4 


W2D 2 - (2D) 2 
N\N - 1) 


to = 


/224.050 - 53,824 
(625) (24) 

D 9.28 


4 


2 (D - D) 2 
N(N - 1) 


3.37 


= 3.37 J 
= 2.7537 


(5.04) 


(5.05) 


From the table of t, entering the row corresponding to n = N — 1 
=■ 24, we find the chance of getting a value of t greater than or equal to 
±to\ that is, ±2.754 is slightly greater than 1 in 100 (<.oi = 2.797). 
Hence, the null hypothesis is rejected at the 5 per cent level of significance. 
We conclude that the two groups can not be assumed to be random 
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samples from the same normal population, or that the mean scores are 
significantly different at the 5 per cent level. The educational conclusion 
under certain assumptions is that the two methods of teaching produced 
significantly different results. 

When a large number of pairs of individuals are used it may be 
advantageous to work with the original measures, thus avoiding the calcu¬ 
lation of the differences. 

Using this method of calculation, the value of to is 

<0 = - p= — =-£ = ~ = --_ =» (5.06) 

S(Xi - XO 2 + 2(X S - X,y - 22(Zi - Z,)(Z 2 - X t ) 

\ N(N - 1) 


The calculations are shown for the same problem used to illustrate 
the method based on the calculation of the differences. This method 
follows (with appropriate methods for reducing the mathematical calcu¬ 
lations) the more commonly used formula of the standard error of the 
difference between the means of correlated measures: 


a x-Y 



“f" Gy ^TxyCxGy 

N 


when X = X\ 
,and Y = X 2 ) 


(5.07) 


The demonstration of the equivalence by applying the respective 
methods to the same set of data is given in Table 21. It should be noted 
that the unbiased estimate of a 2 is used in both cases. 

If we had not utilized the information provided by the experimental 
design, different results would have been obtained as noted below. Using 
the method for testing the significance of the difference between the 
means of random samples as in Problem V.4, we have, since Ni = W 2 , 


to = 


X x 


4 


stYi - x,y + s(z 2 - x 2 y 


9.28 


\/32 3491 


N(N - 1) 

= 1.6 


(5.08) 


Entering the table of t with n = 2(N — 1) = 48, it is observed 
(without interpolation) that we should expect to get a value of t greater 
than or equal to ±*o, i-©-, ± 1.6 in repeated sampling in more than 5 per 
cent of the cases. The conclusions are, therefore, altered from those 
drawn earlier by the calculation of £<>. 

It is usually advisable to calculate both to and ti and make both tests 
of significance. Sometimes one and at other times the other may be the 
more sensitive. If either one or the other shows a significant difference 
between the means, it is safe to accept the conclusion of significance. If 
as is most often the case in experimental work, the corresponding values 
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for the respective paired individuals are positively correlated, the stand¬ 
ard deviation of the differences will be thereby reduced. Against this 
favoring circumstance must be weighed the fact that in treating the 
results as a single sample, the number of degrees of freedom is only half 
as great as if the two samples had been treated separately, that is, if 
two random samples were used for experimental subjects. From the 
findings of the two tests of significance for the same data, a direct statis¬ 
tical measure of the efficacy of the basis of pairing used is made available. 2 

Problem V.7. The sign test of significance. A simple test of signifi¬ 
cance is available for application to the data in Problem Y.6. This is the 
“sign test” or “binomial series test” for the case of randomized blocks 
with two columns (Refs. 10 and 5). The statistic used is the number of 
positive differences among the differences of the several pairs of indi¬ 
viduals. The zero differences are usually divided evenly among the 
positive and negative ones. Thus: P 0 = +’s + ^O’s. The mean num¬ 
ber of positive values expected according to the binomial series (i + £) 25 
is 

H = np = 12.5 
<r — y/npq — .5 yfn = 2.5 
x _ n(P 0 - .50) 

.5 y/n 

" ”5 V25 2^5 2 ' 20 

X may be referred to a normal scale (Table I, Appendix), from which it 
is found that P = .0278, or P < .05. The hypothesis of no difference 
between the two groups as revealed by the differences in signs is rejected 
at the 5 per cent level. 

The method differs from the most reliable Z-test in using only the 
information in the sign as compared with the total available information 
in the actual values used by the latter. The former method may be 
shown to be 62 per cent as efficient as the latter; that is, 62 pairs using the 
t-test would give as precise results as 100 pairs in using the sign test. 3 

Problem V.8. The significance of the difference between per¬ 
centages. There is frequent need to determine the significance of the 
difference between two percentages. Take, for instance, the following 
problem: 

According to one investigator, 67 of an unselected sample of 793 males 
and 3 of an unselected sample of 232 females from the same United States 
Caucasoid population were color-blind. Is this evidence of a sex differ¬ 
ence in this trait? The hypothesis to be tested is 

H 0 :Pl = Pi = p 

2 For further discussion of the efficiency of this experimental design see page 292. 

2 For meaning of “efficiency,” see page 105. 
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The maximum likelihood estimate of p is 

_ h + tt 
Pa »i + n 2 

where <1 is the number of color-blind individuals in the male sample; 
t», in the female sample. 

67 + 3 

P ° 793 + 232 ’ 90 

t\ _ fa 

x = Ux ng 

V p#90 (m + ^) 

0 7 3 

__ _ Tinr jyj _ __ 

W2r) J 

Referred to the normal scale, it is found that P < .01. Therefore, the 
hypothesis is rejected; that is, there is a significant sex difference in color¬ 
blindness in the population sampled. 4 

Problem V.9. The significance of the difference between the abso¬ 
lute variabilities of two groups. The following problems illustrate the 
method of testing the significance of the difference between two variances: 

(a) From the measurements of heights in centimeters of 2518 boys 
and 2538 girls, both groups fourteen years of age, the sum of squares 
of the deviations for the former was 189,811.041552 and for the latter 
114,896.931496. Is there a significant difference in absolute variability? 
The calculations are carried out in Table 22. 


TABLE 22 

The Significance of the Difference between Two Estimates of Variance 


Sex 

Degrees of 
freedom 

Sum of squares 

Mean 

square 

Log mean 
square 

1 

n 

Male 

2517 

189,811.641552 

75.412 

4.3226 

.0003973 

Female 

2537 

114,896.931496 

45.306 

3.8133 

.0003942 


Diff: Sum: 

0.5093; .0007915 


The mean squares are obtained by dividing the sum of squares by the 
degrees of freedom. The difference of the logarithms is 0.5093, so z is 
0.2546. The variance of z is one-half the sum of the last column, or 
.0003957; the standard error of z is .0199. 

- = 12.8 

<Ji 


67 + 3 
793 + 232 

(5.10) 


* The x ! test is an exact test for this problem. 
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Referred to the normal scale, we find: 

P < .001 

Therefore, there is difference in the variability of the two sexes. 6 

(b) Two samples of boys were available in a city school system. 
One sample of 121 boys of twelve years of age had a mean weight of 72.7 
lb. Fifteen years later, another sample of 61 boys twelve years of age 
from the same school had a mean weight of 77.74 lb. The mean square 
of the weights (lb) 2 of the first sample was 141.60 and that of the second 
sample, 95.756. Is the difference in variability significant? 

The calculations for the test of significance are set forth in Table 23. 


TABLE 23 

The 2-Test of the Significance of the Difference between Two Variance 

Estimates 


Sample 

Degrees of 
freedom 

Weight mean square 
(lb)’ 

I log, (mean square) 

1 

120 

141.600 

2.4765 

2 

60 

95.756 

2.2809 

We enter the 

z-table of Fisher (Ref. 10) with n\ 

Zo = 0.1956 

= 120 and n 2 = 60 


and by interpolation find that z . 05 = .1917. We could enter Table IV, 
Appendix, of the variance ratio, F, with n\ = 120, n 2 = 60, and F 0 = 

= 1.479, and find F . 06 = 1.472. 

Since z 0 is slightly larger than z. 05 , or F 0 slightly larger than F. 0 b, we 
conclude that the difference in variability in weight between the two 
groups of boys is significant (at the 5 per cent level). 

Problem V.10. To test the homogeneity of a set of estimated vari¬ 
ances. The statistical analysis of data often involves the calculation of a 
number of estimated variances and the testing of whether the sample 
estimates are significantly different. Three tests of homogeneity of 
variability are described here. 

Neyman and Pearson (Ref. 16) used the criterion L\ } the ratio of a 
weighted geometric to a weighted arithmetic mean of the mean squares 
from which the variances were estimated, in order to test the hypothesis, 

Hi: 

O-l = <72 = * • • <7fc = <7 

This is the test that these k independent samples have been drawn from 
normal populations having a common standard deviation. 

* The near normality of z for large and equal values of ni and n 2 (see page 55) 
has been the basis of the test used here. Either the z-test or the variance ratio F 
could have been used. 
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Welch (Ref. 24) indicated how L\ could be generalized and how the 
weighting for the different sums of squares could be modified. Nayer 
(Ref. 15) computed tables of the 5 per cent and 1 per cent probability 
levels for Li for the case of equally sized samples. He also considered 
how far, in the case of unequally sized samples, the probability levels for 
L\ might be obtained from his tables. Nair (Ref. 14) investigated the 
form of the true distribution of Li. 

The test presented here is based on the modified formula of Welch 
and the use of Nayer's tables of the Li-distribution. 

Welch’s equation is 


^n(!)*n 


20 , 


(5.11) 


where s = 1, 2, • • • , k)U denotes product; 2 denotes summation; n„ 
the number of individuals within the sth sample; N, the number of 
individuals in all the samples; and $ 8 is the sum of squares of the errors 
or the residual of a sample. In the case considered here, 

6. = 2 (X.i - x,y 

i 

where X 8i represents the value of the variate for the ith individual in the 
sth sample and X 8 represents the mean of the sth sample. 

N 

Nayer’s tables are entered with k, the number of samples, and n = 

fC 

the average sample size. Hartley (Ref. 12) later indicated that the 
geometric rather than the arithmetic mean should be used when an 
average of unequally sized samples is needed. In using Li-tables, rejec¬ 
tion of the hypothesis, H h is indicated when the obtained L\ is equal to 
or less than the tabled values of Li at the respective 1 or 5 per cent level 
(Table Y, Appendix). 

The second test of homogeneity of variances was given by Bartlett 
(Ref. 2). He suggested a test analogous to the L\ test in which the sums 
of squares are weighted with the appropriate number of degrees of freedom 
instead of with the number of observations as in the Neyman-Pearson 
criterion. Thus where s? is the unbiased estimate of of based on a sum 
of squares having v t degrees of freedom, and there are k independent 
estimates, the test function is 

k k 

-2 log. n = N log. [ V _ V {vt i oge 5? ) (5.12) 

L tZ i tlx 

k 

N - £ (v t ) 

<-l 
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and natural logarithms to the base e are used. Where none of v t ’B are 
too small, —2 log, n is distributed approximately as x 2 with k — 1 
degrees of freedom if the a\(J, — 1, 2, • • • , k) have a common value. 
Bartlett gave a corrective factor, C, for small samples: 

<7 - i +3w i =T)IXs; _ s) 


He indicated that the quantity 


- (2 lOge At) 

C 


followed approximately the 


same x 2 distribution. 

Bishop and Nair (Ref. 4) demonstrated that even in using the correc¬ 
tion factor Cj the x 2 approximation is not altogether satisfactory if some 
of the degrees of freedom, v t , are 1, 2, or 3. Later, Hartley (Ref. 12) 
derived a method of approximating to the distribution of Bartlett's 
— 2 logwhich was shown to be sufficiently accurate to permit the 
degrees of freedom to drop to 2 with a fair approximation even if some 
of the variance estimates based on 1 degree of freedom are among the 
fc-values. In Hartley's method the probability integral is represented 
as a weighted mean of x 2 integrals. Thompson and Mennington (Ref. 
23) have published tables of the criterion called M> based on Hartley's 
approximation. 

We shall illustrate the three tests of homogeneity of the variances by 
applying them to the same set of data. 

In Table 24, column three is given a set of five estimates of variance, 
calculated from five samples of intelligence test records of pupils in five 
different grades of a given school. It is desired to test whether or not 
there are any real grade differences in the test score dispersion of the 
pupils. To this end the calculations in obtaining the value of the 
criterion Mo are set forth as shown in the table. 


TABLE 24 

Calculations for Obtaining the Value of the Criterion for Bartlett’s Test 
of the Homogeneity of Estimated Variances 



(2) 

No. of 
pupils 
nt 

(3) 

Intelligence 

variance 

8 t * (score*) 

(4) 

Vi 

(5) 

log. 8j* 

(6) 

Vl log, 8|* 

(7) 

l 

Vt 

3 

35 

59.5345 

34 

4.08656 

138.94304 

0.02941 

4 

37 

98.4369 

36 

4.58942 

165.21912 

0.02777 

5 

35 

105.1378 

34 

4.65527 

158.27918 

0.02941 

6 

36 

138.3325 

35 

4.92966 

172.53810 

0.02857 

7 

37 

39.4520 

36 

3.67509 

132.30324 

0.02702 

Total 

180 


175 - N 


767.28268 

0.14218 
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We obtain further: 


Si',Sj 2 

SOW) 

Xv t 


log. 


ZtW 

Xv t 


15,404.4960 

IM^eo = 88>0257 


4.47763 


Following (5.12), we obtain 


-2 log, n = Mo = 175 X 4.47763 - 767.28268 = 16.3026 

Entering Table VII of the 1 per cent points of the M-distribution 
(Ref. 23) with ft = 5 it is found that all entries opposite k = 5 are less 
than 16.3026. Without further calculation, therefore, it may be con¬ 
cluded that Mo = 16.3026 is significant at the 1 per cent level. We may 
infer that a significant difference exists in the intelligence dispersion, as 
measured by this test, among the five grades. 

Since the tables of the M distribution are not as yet readily accessible, 
the test may be made by application of (5.12) and (5.12a). Thus: 


C 

Xo 


s 


O’fS 2 ) _ 15,404.4960 
~W ~ ' 175 


= 88.0257 


£ M) 

N log. — = 175(4.47763) = 783.58525 


K 

2 (vt log. sf) = 767.28268 

f-1 

1 + 1 /i .i.ixi. I_ 

^ 3(5 - 1) \34 ^ 36 ^ 34 ^ 35 ^ 36 


(783.58525 - 767.28268) 
1.01144 


= 16.118 



1.01144 


We enter the x 2 -table (Table III, Appendix) with ft — 1, or 4 degrees 
of freedom. We find that our obtained value x§ is larger than the table 
value x 2 = 13.277 at the one per cent point. Therefore, we reject the 
hypothesis, H 0 , and conclude that a significant difference exists in the 
variability of intelligence test scores among these five grades. 

It may be pointed out here that Bartlett^ test would appear advan¬ 
tageous in comparison with the Li-test when the size of the samples is 
much larger than n = 60 (the limit of finite values as given in the Nayer 
table) and an interpolation between 60 and infinity needs to be made. 
Since the range of the Li-values is only from 0 to 1, the test is not highly 
sensitive. 

The tables of the M distribution may encourage the use of Hartley^ 
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approximation, which is likely to be more convenient as well as slightly 
more accurate. 

We now apply the Li-test, which is made as follows, to the same data 
as in Table 24. 

We calculate first the value log L x from Formula (5.11). The calcu¬ 
lations are set forth in Table 25. 


TABLE 25 

The Calculation of log Li for the Li-Test of Homogeneity of Variances 


n« 

D 

log n< n, log n« 

O'. 

log 0'a n, log 6\ 

35 

34 

1.5441 

2,024.1714 

3.3062 

37 

36 

1.5682 

3,543.7297 

3.5495 

35 

34 

1.5441 

3,574.6857 

3.5532 

36 

35 

1.5563 

4,841.6389 

3.6850 

37 

36 

1.5682 

1,420.2703 

3.1524 

N ~ 180 

175 

^ n, log n, = 280.1006 

a 

15,404.4960 

^ n ,log 0 = 620.7093 

a 


log Li = log AT — ^ ^ n. log n, + ^ ^ n. log O', 

= log 180 - (280.1606) + 1 |J(620.7093) 

= 2.25527 - 1.55645 + 3.44838 - 4.18769 
= 1.95951 

We find that L x corresponding to the logarithm 1.95951 is .911. 

* - 6i h “ rmo,lio me “° f/ - - (A + A + A + A + A) 

= 34.98 

We enter Nayer’s tables (Table V, Appendix) with k = 5 and / = 35 
and note that our value, .911, is less than the interpolated 1 per cent 
value of L\. Therefore, we reject the hypothesis and infer that there is a 
real difference in variability in the intelligence test scores among the 
five grades. 

Problem V.U. The significance of the difference between two cor¬ 
relation coefficients. The following product-moment coefficients of .cor¬ 
relation were obtained between scores on two examinations in algebra 
administered at the end of the school year in May and at the beginning 
of the next school year in September. These results were obtained by 
a mathematics teacher in two different schools: 

School A: r\ — .73; ni = 59 

School B: r% = .62; n» — 48 


- log (^»:) 

- log (15,404.4960) 
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We enter the normal table (Table I, Appendix) and find that for 
z — .45, P > .05. We, therefore, conclude that there is no significant 
difference between the two correlation coefficients. 

Problem V.12. The significance of the difference between correla¬ 
tion coefficients determined on the same sample. The following product- 
moment coefficients of correlation were obtained in a class in college 
biology, consisting of 73 students. 

r v i = .30, the correlation coefficient between scores on a test on 
vocabulary ( 1 ) and scores on a test for interpreting various situations 
dealing with states of health and disease ( y); 

r v 2 = .42, the correlation coefficient between scores on a test of 
biological principles ( 2 ) and scores on test ( y); 

Tii — .603, the correlation coefficient between tests (1) and (2). 

The problem is to test the significance of the difference between the 
correlation coefficients, r v \ and r y 2 . Since these correlations were 
obtained on the same sample the procedure described on page 54 is 
followed: 

F = (r v 1 - r v2 ) 2 (N — 3)(1 + n 2 ) 

2(1 - rf 2 - r* x - rfa + 2ri 2 r y ir y2 ) 
p _ (.30 - .42) 2 (73 - 3)(1 + .603) 

to 2[1 - (.603) 2 - (.30) 2 - (.42) 2 + 2(.603)(.30)(.42)] 

= 1.55 


We enter the table of F (Table IV, Appendix) with n\ = 1 and 
w 2 = 70. We find that our value, 1.55, is less than F .05 = 3.98; hence F 0 
is not significant. We conclude that there is no significant difference in 
the two correlation coefficients. 
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Problem V.13. To test the significance of a regression coefficient. 

In a simple regression of one independent variable, an important test is 
whether the regression coefficient is significantly different from zero, or 
the test of the hypothesis that there is no regression of y on x in the 
population sampled. For the required test the theoretical model against 
which the experimental results may be compared is the ^-distribution. 
The value of U is calculated from the sample and the table of t is entered 
with n = N — 2 to determine the probability of getting a value of t 
greater than or equal to ± to in repeated sampling. Here to is the ratio 
of the regression coefficient, b yX) to the standard error of b yx ; that is, 



c Tb 

The standard error of b is given by 


<Tb 


Gy.x 

y/xx 2 


where <r VmX , the standard error of estimate, is obtained from 


** N - 


Y E y 

2 



(5.13) 


(5.14) 


(5.15) 


in which Y 0 is the observed value of Y and Y E is the value of Y estimated 
from the regression equation. The number of degrees of freedom is 
N — 2, since two statistics are estimates of two different parameter 
values in the regression equation: Y E = a + bx. The calculations and 
procedures are illustrated in determining the regression coefficient and 
in testing its significance in Table 26. 

This problem was that of setting up a regression equation for the 
purpose of predicting a knowledge of one character F, from a knowledge 
of a second character X. In this case it was desired to predict the score 
of an individual on one form of an examination from his score on the 
second form. The prediction equation is 

Y*=Y + ^( x ~X) (5.16) 


This equation is called the regression equation for estimating Y from X. 
It is fitted to the observational data by the method of least squares. 

In this regression problem, it is necessary to run two tests of sig¬ 
nificance: (1) for the regression coefficient b yx and (2) for the mean of the 
dependent variable, F. 

The test of significance for the regression coefficient, b yxy is given by 

t = ha. 
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In our problem the values are 

. _ .9873 _ 
to ~ .0729 ~ 13,5 

We enter the table of t with n = N — 2 = 25 — 2 = 23 and find that 
P < .001. Therefore, the regression coefficient is highly significant. 

TABLE 26 

Calculations for Setting Up the Regres¬ 
sion Equation between the Scores on Two 
Forms of a Test 


Indi¬ 

vidual 


X Y X f Y' \X n Y' 2 \X'Y' 


1 

46 

52 

- 4 

2 

16 

4 

2 

38 

38 

-12 

-12 

144 

144 

3 

64 

63 

14 

13 

196 

169 

4 

73 

65 

23 

15 

529 

225 

5 

61 

58 

11 

8 

121 

64 

6 

34 

33 

-16 

-17 

256 

289 

7 

57 

49 

7 

- 1 

49 

1 

8 

66 

63 

16 

13 

256 

169 

9 

25 

24 

-25 

-26 

625 

676 

10 

30 

26 

-20 

-n 

400 

576 

11 

45 

33 

- 5 

-17 

25 

289 

12 

73 

71 

23 

21 

529 

441 

13 

45 

48 

- 5 

- 2 

25 

4 

14 

55 

63 

5 

13 

25 

169 

15 

66 

70 

16 

20 

256 

400 

16 

49 

46 

- 1 ■ 

- 4 

1 

16 

17 

64 

65 

14 

15 

196 

225 

18 

45 

46 

- 5 ■ 

~ 4 

25 

16 

19 

61 

62 

11 

12 

121 

144 

20 

52 

46 

2 

- 4 

4 

16 

21 

67 

68 

17 

18 

289 

324 

22 

59 

53 

9 

3 

81 

9 

23 

55 

55 

5 

5 

25 

25 

24 

51 

52 

1 

2 

1 

4 

25 

50 

48 

0 - 

- 2 

0 

4 

Total 

1331 12971 

81 

47 

4195 4403 


= 4195 


^ = 60 + fi = 63.24 

— mean of X scores 
7 - 60 + H = 51.88 

= mean of Y scores 
Let x — X — X and y = Y — V 

2* 2 = 2(X - X) 2 = XX' 2 - 

25 

= 4195 - 262.44 = 3932.56 
(47) 2 

Xy 2 = 4403 - —■ 

= 4403 - 88.36 = 4314.64 

a, - »r - 

- 4035 - 152.28 = 3882.72 


- 4035 - 


X and Y are scores on two tests. 
X' - X - 50 and Y' - Y - 50. 


Regression Equation 

y + (x - j?) 

2a: 2 


51.88 + 


3882.72 


1 3932.56 v 

51.88 + .9873(X - 53.24) 
51.88 + .9873X - 52.56 
.9873X - 0.68 


Significance of Regression Coefficient 
«jrx _ 4.57 _ 4.57 

Vs* 2 ” V3932.56 62.7 

! .0729 = standard error of the regression 
coefficient 

.9873 

-7^ = 13.5-P < .01 
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TABLE 26 (Continued) 
Standard Error of Estimate 


y.-yj- 


2(F 0 - Fs) 2 


N -2 




(W 

Ss 2 

2 


TesJ of Significance of y 
{y — a) y/n 


4 


4314.64 


(3882.72)* 

3932.56 


where s 




S(Fo - Y e ) a 
n — 2 
n — n f — 2 


23 




'4314.64 - 3833.51 
23 




481.13 

23 


= \/ 20.9189 


« 4.57 = standard error of estimate 


A simpler alternative test of the significance of the regression coeffi¬ 
cient can be made, where the correlation coefficient, r*y, is available and 
under the conditions indicated below. 

When the regression of y on x is linear and the arrays of y are normal 
and homoscedastic 6 (see Ref. 11), the £-test affords the exact test of the 
significance of the deviation of a sample regression coefficient from any 
hypothetical value (specified by the hypothesis tested) divided by an 
estimate of its standard error, considered as a random sample of similar 
estimates in repeated samples with the same values of x. 

When the hypothesis under test is that the population value, p, is 
zero and when the distribution of X is continuous, the f-test for b yx is also 
an exact test for the sample correlation coefficient, r: 


b = r\/N~^2 
Sb y/\ — r 2 


(5.17) 


This equality is illustrated by calculating r for the set of data in Table 26. 
Thus, with r = .94, 


= .9873 = .94 \/23 = 

.0729 Vl - (.94) 2 


Entering the stable with n = 23, it is observed that P < .001. There¬ 
fore, the observed value of b (or r) is highly significant. 

For a test of the hypothesis that two regression coefficients b\ and 6 2 , 
obtained from two random samples of sizes N i and AT 2 , are from the same 
population, to is given by 


to = 


6i — 62 



(5.18) 


where s' = 


,, _ S(F! - Y Ei y + S(F 2 - Ye 


y 


Ni + N 2 - 4 


(n = Nx + N 2 - 4) (5.19) 


6 For tests of linearity and homoscedasticity, see page 241. 
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As an alternative, the significance of the difference between the two 
correlation coefficients could be tested as in Problem V.ll. 

Problem V.14. The significance of the mean of the dependent vari¬ 
able in a simple regression equation. The same set of data used in the 
preceding problem may be presented to illustrate the test of significance 
of the second estimate in the regression equation, the estimate of any 
hypothetical value a. In this case the ^-distribution may also be used. 
The sample value of < 0 is 


(Y - «) VN 
s 


(5.20) 


t [Wo ~ Ye)* /r<MX 

where s = J - ^ _ 2 — (5.21) 


Then for this sample and where a is specified as zero, t 0 becomes 


. 51.88 a/25 

<0 - 157 - 


56.7 


(n = N - 2) 


Obviously, P < .001. 

Applications of the Chi-Square Model. The chi-square (Ref. 21) 
model has wide application in statistics, particularly as a test of sig¬ 
nificance in dealing with enumerative data so characteristic of the study 
of attributes. It is appropriate for testing whether a set of observed 
values differs significantly from those which would occur if some specified 
hypothesis were true. One general method of testing such a hypothesis 
is to work out results which would be expected theoretically and then to 
compare these with the observations. 

Problem V.16. To test the effectiveness of principles of classifica¬ 
tion. We may have individuals classified by two characteristics and wish 
to test the hypothesis that the characteristics are independent or that the 
principles of classification are independent. 

In applying the x 2 -test to two or more classifications, usually the 
statistical hypothesis under test is that the two characteristics upon 
which the individuals have been classified are independent of one another, 
and then the truth or falsity of the hypothesis is tested. This procedure 
is equivalent to determining whether a set of obtained values differs 
significantly from those which would result if only chance factors were in 
operation. 

In the following example the x 2 -test is applied to a 2 X 2-fold con¬ 
tingency table (Table 27). 

Here 366 twins have been classified on the basis of two characteristics 
according to (1) their genetic constitution, that is, according to whether 
they are identical or fraternal twins, and (2) the presence or absence of 
mental deficiency. The numbers of identical and fraternal twins are 
recorded in the marginal totals in the last column of the table of observed 
values. The number of concordant and disconcordant twins with respect 
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to mental deficiency is given in the marginal totals in the last row of the 
table. 

The x 2 -test is applied to determine the independence of these two 
factors. The geneticist or psychologist might state the problem thus: 
Assuming that the data are accurate, homogeneous, and unselected, with 
what frequency could so large a disproportion between the two classes of 
twins arise if the same causes leading to mental deficiency had been 
operative on the two? 


TABLE 27 

Concordance and Disconcordance in Identical and Fraternal Twins for 
Mental Deficiency (After Rosanoff, Ref. 20) 



Observed values 

Type 

Number 

Number 

Total 


concordant 

disconcordant 

Identical twins 

115(a) 

U(b) 

126 

Fraternal twins 

128(c) 

112(d) 

240 

Total 

243 

123 

366 



Expected values 

Type 

Number 

Number 

Total 


concordant 

disconcordant 

Identical twins 

83.66(a) 

42.34(b) 

126 

Fraternal twins 

159.34(c) 

80.66(d) 

240 

Total 

’ 243.00 

123.00 

366 


The number of observations to be expected in each cell where only 
chance factors are operative can be calculated from the total frequency 
in this way: Multiply the total number of identical twins, 126, by the 
total number of concordant twins, 243, that is, 126 X 243 = 30,618, and 
divide this product by the total number of twins in the sample, 366, that 
is, 30,618/366 = 83.66. The expected number in the other cells of the 
tables can be calculated in the same way. This need not be done, how¬ 
ever, in a 2 X 2 table. Since the marginal totals are fixed, the expected 
values for only one cell need be calculated, the others being filled in by 
subtraction. Thus, the expected value for cell b is 126 — 83.66 = 42.34; 
that for cell c, is 243 — 83.66 = 159.34; and that for cell d is 123 — 42.34 
= 80.66- 
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X 2 is given by the formula 


l 


(fo-ft) 2 

ft 


( 5 . 22 ) 


where fo stands for observed frequency and f t for expected frequency 
The square of the differences between the observed and expected values 
is divided by the expected value for each cell. These quotients are 
summed to give x 2 - 

The calculations for the above data are presented in Table 28. 


TABLE 28 

The Calculation of x 2 for the Data in Table 27 


Cell 

fo 

/. 

(/o -/«) 

(/»-/<)’ 

(fo 

ft 

a 

115 

83.66 

i 

31.34 

982.1956 

11.74 

b 

11 

42.34 

-81.84 

982.1956 

23.20 

c 

128 

159.34 

-81.34 ! 

982.1956 

6.17 

d 

112 


31.34 

982.1956 

12.18 

Total 

366 



Xo J 

* = 53.29 


The calculated value Xo is used to determine the probability of getting, 
on a random sample, the value of x 2 equal to or higher than x 2 in repeated 
sampling. The alternative is the probability that the difference between 
the observed and expected values may be attributable to chance alone. 
This probability is obtainable from Table III, Appendix, Distribution 
of x 2 . The number of degrees of freedom with which the table is entered 
is in this problem equal to 1, since it was observed that only one of the 
cell frequencies could be filled in independently. When this quantity is 
specified, the other cells can be filled in by using the marginal totals. 

Therefore, we enter the x 2 -table with a value of x§ == 53.29 and n = 1. 
It is noted that for values of x 2 greater than 10.827 the probability that 
the differences between the observed and obtained frequencies could 
have arisen by chance is < .001. The table does not give the value of P 
for a value of x 2 = 53.29. The probability, however, is much less than 1 
in 1000. Hence it may be concluded that the system of classification 
used in this problem was effective, or that the two basic characteristics, 
type of twin and mental deficiency, are associated. 

It may be pointed out that in a 2 X 2 table the value of x 2 could have 
been obtained directly from the formula 

2 _ (ad - be) 2 (a + b + c + d) 

X ~ (a + b)(c + d)(a + c)(b + d) { ; 

In our problem, 

o _ r(115)(112) - (11)(128)] 2 (366) „ „ 

Xo (126) (240) (243) (123) 
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The correction for continuity, devised by Yates (Ref. 25), is useful 
for extending the application of x 2 -test of significance to contingency 
tables with small frequency, that is, to data in which the expectations 
are small. Cochran (Ref. 6) has presented and illustrated the principles 
involved in correcting for continuity on some applications of x 2 « 

The process of calculating x 2 fora2 X 2 table can be extended to the 
general case of the r X c contingency table. In general, in a table of 
r rows and c columns the number of degrees of freedom in x 2 is (r — 1) 
(c — 1). Bartlett (Ref. 1) devised a method of calculating x 2 for 
multiple-dichotomous tables, that is, those of the form 2^. Norton 
(Ref. 17) presented and illustrated a method of successive approximation 
for obtaining the R departures from expectation in a complex contingency 
table of the form 2 N X R- 

Problem V.16. To test the homogeneity of two or more frequency 
distributions. A useful application of the x 2 -test is in testing the 
hypothesis that two or more frequency distributions could have come from 
the same homogeneous population. This is a more stringent test than 
those tests of the significance between certain summary statistics of the 
distributions, since by it the distributions are compared in all respects. 
Furthermore, it is possible to separate the contributions to x 2 of the 
individual degrees of freedom, and so to test the distributions by parts. 

The following example illustrates the case where there are two dis¬ 
tributions and n' classes with n' — 1 degrees of freedom. The method of 
calculating x 2 devised by Brandt and Snedecor (Ref. 11) is followed. 

The two samples are distributions of two groups of freshmen entering 
a particular college of the University of Minnesota classified according 
to college aptitude-test rating. One distribution, of 475 students, pre¬ 
sented two units of high-school mathematics; the other, of 111 students 
presented three units of high-school mathematics at the time of entrance. 
We wish to test the hypothesis that these two samples are from the same 
homogeneous population with respect to aptitude as measured, or whether 
there is a significant difference between the two distributions. 

If we denote the column of frequencies of the group with two units 
by a', that of the group with three units by a, the value of x 2 is given by 
the formula 

X 2 = Yq Qf “ n $) (5-24) 

where p = —r~ 7 \ 

(a + a') 

- = »x 

” »1 + «2 

The calculations of x 2 for the test of significance of the homogeneity 
of the two frequency distributions are given in Table 29. 

For a xo — 30.96 with n — 9, we enter the x 2 -table and find that for 
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TABLE 29 

Calculation op x 2 for Two Frequency Distributions—One with Two Units 
of High-School Mathematics, the Other with Three, Grouped according to 
Percentile Ranks on the College Aptitude Test 


Class intervals 
in 

percentile ranks 

Units of high-school 
mathematics 

a 

p 

aP 

/ — , i 

a + a' 

Two 

(o') 

Three 

(a) 

91-100 

18 

10 

.357143 

3.571430 

81- 90 

33 

12 

.266666 

3.199992 

71- 80 

39 

14 

.264151 

3.698114 

61- 70 

43 

3 

.652087 

1.956261 

51- 60 

39 

13 

.250000 

3.250000 

41- 50 

51 

3 

.055550 

0.166650 

31- 40 

47 

12 

.203390 

2.440680 

21- 30 

66 

14 

.175000 

2.450000 

11- 20 

68 

8 

.105263 

0.842104 

0- 10 

71 

22 

.236559 

4.204298 

Total 

475(n 2 ) 

lll(ni) 

.189420 

25.779529 


P Zap 

*° 2 - us srn mss) [25 * 779529 - (111)( - 18942)1 

= 30.96 ~P < .001; for n = 9, x 2 .ooi - 27.877 


values of x 2 greater than 27.877 the divergencies between the observed 
frequencies in the two distributions could have arisen by chance in less 
than .001. We do not know the value of P, corresponding to a value of 
X 2 = 30.96, but the probability of such a divergence arising by chance is 
less than 1 in 1000. We may conclude, therefore, that there is a sta¬ 
tistically significant difference between the two distributions. The 
pedagogical conclusion is that groups presenting three units of high-school 
mathematics are superior on the whole on the College Ability Test to the 
groups presenting two units. 

It is possible to separate the contributions to x 2 from each of the 
individual degrees of freedom, and so to test the distributions by parts. 

For 4 degrees of freedom the calculations for x 2 are 


Percentile ranks 





on College 
Aptitude Test 

Two units 

Three units 

Total 

P 

81-100 

51 

22 

73 


61- 80 

82 

17 

99 

.171717 

41- 60 


16 

106 

.150943 

21- 40 

113 

26 

139 

.187050 

1- 20 

139 

30 

169 

.177515 

Total 

475 

111 

586 



Xo 2 = 7.3438 
> .05 










96 


PROCEDURES IN TESTING HYPOTHESES [Chap. V 
For 1 degree of freedom: 


P.R.C.A.T. 

Two units 

| Three units 

Total 

P 

Above 80 P.R. 

51 

22 

73 

.301370 

80 and below 

424 

89 

513 

.173490 

Total 

475 

i 111 

586 

. 189420(B) 


Xo 2 = 6.8066 
x 2 oi = 6.635 
Pxo 2 < .01 


The portion of the distribution contributing the most to the differ¬ 
ences is, accordingly, in the highest percentile ranks, or 81-100. 7 

Problem V.17. To test the agreement between a theoretical and an 
observed distribution. One general method of testing a statistical 
hypothesis is to work out the results which would be expected theo¬ 
retically under the assumption that the hypothesis is true, and then to 
compare these with the observations. The chi-square test provides an 
efficient test of the goodness of fit. As an illustration we shall test the 
hypothesis that a set of data presented by Roberts et al. (Ref. 19) is 
described by a Poisson series. 

The data given in Table 30 were obtained in administering the Binet 
Test (a shortened form) to a group of children who passed all but one 

TABLE 30 

Additional Tests Failed on Downard Extension of the Binet Scale to a 

Sample of 131 Children 
(After Roberts, Ref. 19) 


Number of 
tests failed 

Observed 

frequency, 

/• 

Expected 

frequency,* 

ft 

x*t 

0 

88 

87.41 


0.004 

1 

34 

35.37 


0.053 

2 

8 

7.161 



3 

4 

1 

0 



0.070 

5 

0 

mmm. 



Total 

131 

j 131.02 

0.127 


* The theoretical distribution is obtained as follows: 


7 The reduction in the x 2 -values with coarser grouping of the data is noted. This 
result is to be expected with the reduction in the number of degrees of freedom and the 
corresponding approach to the zero tail of the x 2 ~distribution. 
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TABLE 30 ( Continued) 


1. Calculate the mean number of tests failed: J? 

2. Calculate the expected frequency. This is 
Thus: 

= h\ = 0.4046. 

done by means of logarithms. 

Quantity 

Logarithm 

Expected frequency 

n - 131 

e m ■» gO.4046 log e ) S3 

(.4046)(.43429) - 

n/e m 

2.11727 

0.17571 

1.94156 

87.41 

m = 0.4046 
mn/e m 

9.60703 - 10 
1.54859 

35.37 

m 

9.60703 - 10 

1.15562 


m 2i n/2e m 

0.30103 

0.85459 

7.155 

m 

9.60703 - 10 


m 3 *n/( 2) (3)e m 

0.46162 
0.47712 
9.98450 - 10 

0.965 

m 

9.60703 - 10 

9.59153 - 10 
0.60206 


m A> n/ (2) (3) (4)e“ 

8.98947 - 10 

0.0976 

m 

m hh n/ (2) (3) (4) (5)e m 

9.60703 - 10 

8.59650 - 10 
0.69897 
7.89753 - 10 

0.007898 


y*(f _ f \2 

t Chi-square is determined in the usual manner by calculating The 

ft 

classes from 2 through 5 have been grouped because of small frequencies. This 
grouping could have been done without calculating the theoretical values for the 
classes beyond the third. The calculations were made to illustrate the method. 

Xo 2 =* 0.127 with n = 1. The corresponding probability value is .70 < P < 80. 
There is 1 degree of freedom, since the sample mean has been used as the parameter 
of the Poisson distribution and the sample number has been used to calculate the 
theoretical frequencies. 


of a complete year of tests, then by extending the testing downward to 
determine how many of these pupils failed in one, two, or more tests. 

From the calculations it is noted that for a xo = 0.127 and with n = 1, 
the corresponding probability is between .70 and .80. Therefore, we 
may conclude that the Poisson distribution provides a good fit to this 
set of data. 


Problems 

1. The following are two distributions A and B from the Miller Anal¬ 
ogies Test. Determine whether they are random samples from the 
same population. 
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A B 


85 

53 

43 

77 

53 

79 

48 

66 

50 

65 

75 

48 

51 

56 

99 

57 

75 

67 

47 

53 

99 

79 

53 

79 

80 

88 

76 

84 

48 

86 

67 

75 

76 

69 

75 

62 

77 

69 

84 

83 

89 

77 

75 

54 

69 

78 

48 

96 

56 

72 

71 

27 

48 

90 

76 

75 

76 

67 

89 

84 
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2 . The following data are from an experiment comparing the relative 
efficacy of two different methods of teaching beginning high-school 
algebra. There are XXIV pairs of students, paired on the basis of 
chronological age and a pretest in arithmetic. There were two 
criteria of achievement: (1) scores on an inventory test, (2) scores 
on an achievement test. The values under “Exp.” and “Con.” 
refer to the experimental and control groups, respectively. Test the 
null hypothesis in this experiment. 


Data for Problem 2 


Pairs 

Chron. Age 

Arith. 

Inventory 

Achievement 

Exp. 

Con. 

Exp. 

Con. 

Exp. 

Con. 

Exp. 

Con. 

I 

152 

157 

99 

96 

57 

57 

50 

33 

II 

172 

157 

87 

87 

61 

67 

32 

20 

III 

173 

177 

85.5 

86.5 

60 

62 

24 

28 

IV 

169 

166 

85 

86.5 

55 

55 

28 

19 

V 

160 

156 

85 

86.5 

50 

50 

33 

23 

VI 

168 

162 

82.5 

82 

50 

50 

31 

28 

VII 

171 

169 

96.5 

97.5 

56 

57 

36 

28 

VIII 

160 

156 

92 

91 

56 

56 

43 

31 

IX 

177 

171 

83 

86 

60 

60 

33 

21 

X 

165 

161 

85.5 

86.5 

57 

56 

33 

21 

XI 

164 

165 

87.5 

84.5 

57 

56 

38 

27 

XII 

167 

161 

84.5 

83 

56 

56 

28 

28 

XIII 

171 

171 

96.5 

96 

57 

57 

35 

20 

XIV 

168 

169 

99.5 

99 

50 

51 

42 

24 

XV 

175 

177 

83 

80.5 

56 

56 

29 

27 

XVI 

172 

175 

93 

90 

56 

56 

41 

18 

XVII 

169 

170 

81.5 

79 

56 

57 

35 

20 

XVIII 

161 

167 

90 

87.5 

56 

56 

36 

26 

XIX 

165 

171 

87.5 

87.5 

56 

56 

28 

28 

XX 

174 

168 

83 

85.5 

56 

56 

20 

27 

XXI 

176 

175 

93 

94 

51 

50 

29 

29 

XXII 

170 

165 

77 

79.5 

42 

50 

42 

29 

XXIII 

174 

172 

77 

79.5 

56 

56 

29 

24 

XXIV 

174 

170 

86 

86 

50 

50 

38 

30 


3. (a) In Problem 2 determine the statistical significance of the differ- 
- ences in achievement of the two groups by only considering the 
signs of the respective differences between the scores of individual 
pair members. 











Chap. V] PROCEDURES IN TESTING HYPOTHESES 99 

(b) Compare the efficiency of the test used in (a) with that of the 
test used in Problem 2. 

4. Determine the significance of the difference of the percentage of 
those taking the second test, reaching or exceeding the median 
score of the group taking the pretest in the fall of 1935. 



10 20 30 40 50 60 70 80 90 100 

Percentile Rank on Algebra Test 

5. For the following distribution calculate (a) The variance from the 
grand mean; (b) the variance from the sample means. 

(c) Note the extent of agreement. 

(d) Why is it necessary to pay proper regard to the number of 
degrees of freedom? 

Sample I II III IV V VI VII VIII IX X 

~~l2 25 18 8 20 21 15 24 28 29~ 

29 25 22 21 22 19 23 10 25 21 

22 23 17 17 24 12 18 23 14 18 

20 14 23 20 14 11 23 22 20 16 

22 24 11 14 22 14 20 20 29 22 

6. A check-up on the reading habits of seventh-grade pupils reveals 
that 55 per cent of the 558 voluntary readings of one random sample 
of pupils was mystery and detective, where only 45 per cent of the 122 
voluntary readings of another random sample of pupils was of this 
classification. Is there statistical evidence here that interest in the 
mystery and detective type of reading is higher in one sample than 
in the other? 
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7. In an attitude test administered to an experimental group of 796 
students and a control group of 861, Item 306 was answered correctly 
by 51 in the experimental group and by 47 in the control group, 
(a) Is there a statistically significant difference between the propor¬ 
tion of the experimental group that answered this item correctly and 
the proportion of the control group that answered it correctly? What 
is the statistical hypothesis tested? (b) In Item 35, 37 of the experi¬ 
mental group and 37 of the control group answered this item cor¬ 
rectly. Answer the above questions in regard to this item. 

8 . The following measures were obtained from an examination in per¬ 
sonal hygiene for a winter quarter class and for a spring quarter class. 
Determine the significance of the difference between means. May 
the variances be assumed equal? What hypothesis is under test? 
What is the most appropriate test of the hypothesis? 

Winter quarter class: 

Mean = 20.56 

Sum of squares of deviations from the mean = 28,255 
Number = 675 
Spring quarter class: 

Mean = 22.07 

Sum of squares of deviations from the mean = 12,535 
Number = 350 

9. In a given situation n = 81, mean = 40, and standard deviation = 8. 
If we assume that the standard deviation of the increased number of 
cases will remain approximately the same as given, what size of 
sample is necessary to reduce the standard error of the mean to .5? 

10. The following data indicate the frequency of intrapair differences in 
handedness in identical twins and in the handedness of their immedi¬ 
ate relatives: 



Identical twins 


R-R 

R-L 

Without left-handed relatives 

105 

25 

With left-handed relatives 

26 

22 

Total 

131 

47 


Is the principle of classification effective? 

11. Following are two distributions of entering freshmen, the one having 
had no high-school work in foreign languages, the other having had 
two or more units in foreign languages. Test the independence of 
the two distributions as wholes and by parts. 
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Frequency subgroups, Units in high-school 
percentile ranks on foreign language 
College Aptitude Test None Two or more 


91-100 

8 

20 

81- 90 

7 

27 

71- 80 

11 

33 

61- 70 

8 

29 

51- 60 

15 

25 

41- 50 

10 

34 

31- 40 

11 

33 

21- 30 

35 

38 

11- 20 

24 

50 

1- 10 

45 

50 


N 1 = 174 

N 2 = 339 


12. The following data were obtained from four random samples of enter¬ 
ing freshmen on a chemistry aptitude test: 

Entering Group 



N 

£ 

s 

1938 

35 

18.66 

3.58 

1939 

48 

17.23 

4.75 

1940 

42 

18.67 

4.95 

1941 

30 

19.53 

3.09 

Total 

155 

18.39 

4.33 


Test the homogeneity of the standard deviations. 

13. The following coefficients of correlation were reported between intelli¬ 
gence quotients (X) and chronological ages (F) for two random 
samples of students in a course in elementary-school science. Test 
the significance of the difference between the two correlation 
coefficients: 

Sample 1: Ni = 96 r xy — —.507 

Sample 2: N 2 = 66 r xy — — .455 

14. The following correlation coefficients were obtained upon a random 
sample of 74 pupils in the sixth grade of an elementary school: 

t xy = .55; v xz — .81; v yz .44 

where x = score on an initial achievement test 
y = mental-age score 
z = score on a final achievement test. 

Test the significance of the difference between r xt and r yt . 

15. Test the significance of the differences among the following corre¬ 
lation coefficients reported for the illustrative problem in multiple 
correlation (see page 332). 

r ly = .1784 r iy = .5164 r by = .6704 

r 2y = .6505 fi V = .0993 
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CHAPTER VI 

THE ESTIMATION OF POPULATION PARAMETERS 

The Problem of Estimation. The estimation of characteristics of a 
population, that is, the estimation of the parameter values of the popula¬ 
tion, is a fundamental statistical problem. In such a problem we usually 
begin with an assumption about or knowledge of the mathematical form 
of the population of which we presume to have a random sample. We do 
not have a knowledge of the values of one or more parameters in the 
mathematical form. These values are required for the complete specifica¬ 
tion of the population. 

In general, there are a number of ways of estimating a parameter 
from sample data, some of which may be better than others. The theory 
of estimation provides a basis for investigating the conditions which an 
estimate should fulfill, for determining the best estimate to use under 
given circumstances, and for comparing the relative effectiveness of 
different estimates that might be used. 

In its most practical form, the problem of estimation is met with 
by the research worker in his attempt to reduce his original data to a few 
summary quantities which shall contain all the relevant information, 
that is, all information which is of use in estimating the values of the 
parameters. The problem of estimation is closely related to that of 
distribution, since both arise in the process of reducing data. From the 
logical standpoint, problems of distribution precede problems of estima¬ 
tion, since knowledge of the random distributions of various alternative 
statistics, derived from samples of a given size, is basic in the selection 
of the particular statistic most useful to calculate. 

The problem of specification, or the specification of the mathematical 
form of the distribution of the hypothetical population from which a 
sample is assumed to have been drawn, completes the theoretical basis 
upon which depends the solution of the problems which arise in the reduc¬ 
tion of data. Although the three problems may be studied separately, evi¬ 
dently they are closely related in the development of statistical methods. 
Our purpose here is to study especially the problem of estimation. This 
is the problem of determining how observational data can be best com¬ 
bined to yield the most accurate estimates obtainable of the unknown 
parameters. Two procedures of estimation are considered: (1) estimation 
by a point and (2) estimation by an interval. 

In order to judge whether one particular estimate or a group of esti¬ 
mates is better than others, criteria are needed. Three criteria have been 
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advanced: (1) consistency, (2) efficiency, and (3) sufficiency. Statistics 
which satisfy these criteria are known as optimum estimates or optimum 
statistics (Ref. 13). 

Characteristics of Good Estimates 

In order to be consistent ) the value of a statistic must approach more 
and more closely the estimated parameter as the sample size is indefinitely 
increased. Such a value is a function of the observations, which con¬ 
verges stochastically to a population parameter as the sample number 
approaches infinity. An efficient estimate is one whose sampling distribu¬ 
tion tends to the normal law with the least possible standard error as the 
number of observations is increased. Efficiency requires that the vari¬ 
ance of the estimate (at least for large samples) should not exceed that 
of any other consistent statistic estimating the same parameter. The 
square of the ratio of the minimum standard error to the standard error 
of another estimate (also normally distributed in the limit) gives a 
measure of the relative efficiency of the second estimate. The criterion 
of sufficiency is satisfied by a statistic when no other statistic calculable 
from the same sample can supply any additional information regarding 
the parameter under estimation. A sufficient statistic is inevitably also 
100 per cent efficient, since it incorporates the whole of the information 
available in the sample in regard to a given parameter. 

The Measurement of Amount of Information. It is apparent that 
these criteria for judging the goodness of estimates require the knowledge 
of the amount of information that is available in any sample relevant to 
the population parameter under estimation. Fisher (1921, 1925) showed 
how to measure the quantity of information provided by the observa¬ 
tional data, relevant to the value of any particular unknown quantity. 
The mathematical quantity used to specify the amount of measurable 
information is the reciprocal of the variance, or the invariance, of the 
estimate. 

The class of estimates which, as the sample is increased without limit, 
tend to be distributed about their limiting value (their mathematical 
expectation) in the normal distribution is the one appropriate to the 
theory of large samples. The amount of information afforded by an 
estimate normally distributed with variance V is 1/F, the invariance of 
that normal distribution. In the normal case, the variance decreases 
with increasing size of sample, n, always ultimately in inverse proportion 
to n . 

The criterion of efficiency, noted above, is that the limiting value of 
nV, where V is the variance of the estimate, shall be as small as possible. 
Fisher (Ref. 11) proved mathematically that the limiting value of 1/nV 
cannot exceed a quantity i , the amount of information provided by each 
observation the value of which is independent of the method of estima¬ 
tion. It was shown that the reciprocal of the variance, or the invariance 
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of the estimate, cannot exceed the amount of information in the sample. 
Thus: 


J. 

V 


^ ni 


I 


( 6 . 01 ) 


This conclusion is dependent on proof that for certain estimates the limit¬ 
ing value of 


1 

nV 


= i 


( 6 . 02 ) 


The Maximum Likelihood Estimate. The instrument supplied by 
Fisher for obtaining the estimates necessary for the limiting value (6.02) 
to hold is the method of maximum likelihood . By this method, estimates 
of the parameters are obtained which maximize the likelihood function 
and have the smallest limiting variance. The limiting value of the- 
sampling variance of the maximum likelihood variance in large samples 
was proved to be 


1 

nV 




We may state here that the probability of occurrence of a sample is 
expressible as a function of the unknown parameters, and the likelihood 
is defined as a function of these parameters proportional to this probabil¬ 
ity. Thus, the method of maximum likelihood gives as estimates those 
values which maximize the probability that the totality of observations 
should be that observed if the hypothesis which specifies the parameters 
of the population sample is true. 

In large samples the maximum likelihood estimate has the smallest 
variance in comparison with any other statistic which is in the limit 
normally distributed. If the comparisons were restricted to statistics 
which in the limit are normally distributed, the utility of this method of 
estimation would be greatly limited. However, a stronger property 
than efficiency is possessed by the maximum likelihood estimate. This 
property exists when estimates may be made which contain within them¬ 
selves the whole of the information available for finite samples. This is 
the property of sufficiency. Where sufficient statistics exist, all the 
available information is contained in the maximum likelihood estimate. 
In random samples from a normal population, the mean and the standard 
deviation—the only two characteristics necessary to specify this popula¬ 
tion—are sufficient statistics. It is this fact that gives the great simplic¬ 
ity to the problems falling within the theory of errors. Thus, in much 
experimental work it is necessary to be concerned only with the precision 
of the sum, or mean, of the observational values and with the estimation 
of this precision from the sum of squares calculated from the data. These 
two quantities contain all the information provided by the data with 
respect to the mean and variance of the hypothetical normal model. In 
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cases where no sufficient statistic exists, Fisher has shown how the infor¬ 
mation in the sample may be recovered by using as ancillary the con¬ 
figuration of the sample. The configuration serves to indicate the 
precision of the estimate made, although it gives no information about 
the value of the parameter itself. 

In experiments where the variance of the population is not known, it 
must be estimated from the data. Such an estimate is itself subject to 
error. For this error, exact allowance is made in the distribution of t 
when we test the significance of the deviation of the observed value from a 
hypothetical value specified by hypothesis. In such cases it would be 
inexact to assume that the amount of information provided by the 
experimental results with respect to the true value under estimation 
would be given by 1/s 2 , the reciprocal of the sampling variance. In 
determining the absolute precision of the experimental result, not only 
the estimate, s 2 , derived from the data but also the number of degrees 
of freedom used in the estimate need to be taken into account. In this 
case it has been shown (Ref. 11, page 249) that the amount of information 
provided by an observed value, x , relative to the unknown mean popula¬ 
tion value, n, is given by 


n + 1 
(n + 3)s 2 


(6.03) 


where n is the number of degrees of freedom. 

Other Methods of Estimation. The most important general method 
of estimation so far discovered, at least from the theoretical standpoint, 
is the method of maximum likelihood. It will be frequently encountered 
in later discussions. There are other methods of estimation which should 
be considered. Under certain conditions all methods may yield similar 
results. 

The oldest general method of forming estimates of the parameters 
of a distribution from sample values is the method of moments introduced 
by Karl Pearson, in which sample moments are equated to the corre¬ 
sponding moments of the distribution which are functions of the unknown 
parameters. As many moments as there are parameters requiring esti¬ 
mation are taken into account. The obtained equations with reference 
to the parameters are solved to give the estimates of the parameters. 
The fitting of the normal curve to a series of observations illustrates 
the process of the method of moments. The moment coefficients often 
involve relatively simple calculations in practice, but their efficiency 
decreases when the variations among the observations depart widely 
from normality. 

The criterion of testing the closeness of an estimate in terms of a 
minimum standard deviation of its sampling distribution has been con¬ 
sidered. Likewise, the criterion of testing closeness of fit of the estimates 
to certain parameters by a minimum x 2 -value has been used. Both 
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these criteria are satisfied by the method of maximum likelihood in deal¬ 
ing with large samples. 

The original use of the x 2 -test by Pearson (Ref. 27) was in the case 
of a completely specified hypothetical distribution. In this case it was 
established that x 2 , under the assumption that the hypothesis is true, is 
distributed in repeated sampling in a x 2 -distribution with r — 1 degrees 
of freedom (r is the number of groups into which the sample values have 
been classified). Most often in practice, the hypothetical distribution 
contains one or more unknown parameters. In these cases certain 
modifications were necessary in finding the limiting distribution of x 2 - 
Fisher (Refs. 9 and 4) showed that, for certain important methods of 
estimation, the modification could be made by reducing the number of 
degrees of freedom of the limiting distribution of x 2 by one for each 
estimated parameter. 

The method of estimation yielding a minimum x 2 value is known as 
the x 2 minimum method of estimation. In practice, the method often 
leads to difficult solutions, so that certain modifications have resulted 
in what is known as the modified x 2 minimum method (Ref. 2, page 426). 
In certain cases this method is identical with the maximum likelihood 
method. In the case of fitting certain distributions, for example the 
binomial and Poisson distribution, and the normal distribution, the two 
methods give the same results. The method of maximum likelihood, 
however, can be extended to problems more general in nature. 

A method of estimation developed by Markoff (Refs. 21 and 26) is 
based on the principle of unbiased estimates. Markoff has shown in 
various cases how to construct linear forms in the observational data 
which give estimates of certain unknown parameters that have no bias 
and the variances of which have the smallest possible value. The process 
of obtaining the best unbiased estimate of the population variance, a 2 , is 

based on this principle, for example, s 2 = ^ 

Point Estimation and Its Limitations. The procedures of estimation 
just discussed may be called estimation by a point. A single value is 
given as the “best” estimate of the true or population value. Such a 
procedure does not provide a basis for specifying the degree of confidence 
one may place in such an estimate. It is known, of course, from sampling 
theory that the estimate made is not likely to be exactly equal to the 
population value. With large homogeneous samples the discrepancy is 
small, but with small samples the discrepancy may be considerable. 
Point estimation does not take directly into account the size of the sample 
which supplies the unique estimate. Because of these limitations in the 
method of point estimation, estimation by intervals seems to be increas¬ 
ing in use. 

There are, of course, many occasions when a single value estimate is 
heeded, particularly for certain subsequent statistical analyses. In the 
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case of interval estimation, the single estimate is wanted as material 
for a subsequent process of estimation. 

Estimation by Interval. We cannot tell from any sample estimate 
whether it is too great or too small. For this purpose further samples 
from the same population would be needed. It seems obvious therefore, 
that what is required is an interval of some kind which may be expected 
to include or cover the true population value in a specified number of 
cases. From the sample value and other ancillary information, we can 
calculate the point values of the upper and lower limits of the interval 
and then proceed to state that this interval will include or cover the 
population value. From sampling theory we can calculate the number of 
times in repeated sampling that the statement would be correct. Thus, 
the proportion of cases in which the statement may be assumed to 
be correct provides a measure of the confidence to be ascribed to our 
statement. 

Fiducial Limits. R. A. Fisher (Refs. 10 and 3) first introduced the 
method of estimation based on the concepts of fiducial 'probability and 
fiducial limits. The basic ideas underlying Fisher’s theory may be 
presented as follows. 

Observations in the experimental or observational sciences are con¬ 
crete and specific occurrences. They are now freely applied as a basis 
for probability statements about parameters whose exact values are 
unknown except for the information available in the observations. The 
kind of reasoning employed here comes from tests of significance, and the 
probability statements are designated as statements of fiducial probabil¬ 
ity, in order to distinguish such statements from those about “ inverse 
probability.” Fisher (Ref. 12) has indicated the fundamental random- 
variable relation which connects sample and population. The essential 
step in establishing this relation is in the following assertion: Irrespective 
of the character of the sample, the probability that the population param¬ 
eter shall fall in any range is derived from the known probability, P, 
which is defined as the function of the variable, and from the test or the 
pivotal quantity in the test of significance. The assertion requires only 
that the unknown parameter value shall fall in the range corresponding 
to these known quantities. In this sense is to be interpreted the some¬ 
what paradoxical statement that a sample with known characteristics is a 
random sample of an unknown population. 

The properties of variable statistics are derived from observations 
which are defined as random variables involving parameters upon which 
their distribution functions are dependent. These properties are used 
to establish the connections between the probability distribution of the 
random variable and the distribution of the statistic used as the pivotal 
quantity in the test of significance. The statistic used as the pivotal 
quantity is functionally independent of the population from which the 
sample is drawn. This connection, once established, gives meaning to the 
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practical situation where the statistics are observable but the parameters 
are unknown. 

An illustration (Ref. 12) serves to show the application of this process 
of reasoning or form of fiducial argument. Following it, one may go 
from forms of statements embodying observations as random variables 
to forms of statements embodying observations as fixed data. In the 
former, the distribution functions include certain fixed but unknown 
parameters; in the latter, the frequency distributions are derived for the 
unknown parameters considered as random variables. 

Let £ be the median of a distribution concerning which the only thing 
known is that its probability integral is continuous. Take the case 
where n = 2, that is, where X i and X 2 are two observational values of the 
variable X. For any given value of £, the facts are that the three 
probabilities—that X\ and X 2 (a) should both exceed the median, (b) 
should lie on either side of it, (c) should both be less than it—must occur 
in the frequency ratio 1:2:1. If r stands for the number of observations 
less than the median, then r becomes a pivotal quantity involving both 
the unknown parameter and the observations with a sampling distribu¬ 
tion independent of the parameter; that is, r takes the values 0, 1, and 2 
with probabilities i, and i, respectively. This leads to the fiducial 
argument from the two given observations, now considered as fixed 
parameters, that the probability is .25 that £ is less than both Xi and X 2 ; 
.50 that £ lies between Xi and X 2 ; and .25 that £ exceeds both Xi and 
X 2 . This reasoning thus leads to a frequency distribution of £, now 
considered as a random variable. 

For a sample of any size, n, the following quantity expresses the 
probability that the median shall exceed r of the observations and be less 
than n — r: 


n 1 

r!(n — r)! 


2 — n 


(6.04) 


Confidence Intervals. The complete theory underlying the method 
of interval estimation developed by Neyman (Ref. 25) cannot be presented 
here. However, the definition and use of the two concepts of confidence 
intervals and of confidence coefficients are presented briefly. 

Consider a sample of n random variables X lf X 2 , . . . , X n , the n 
observational values. Denote by E the set of values of the X variables. 
This set can be represented by a point, called the sample point E in an 
n-dimensional space, the rectangular coordinates of E being Xi, X 2 , . . . , 
X n . Assume that the probability law of the sample Xi, X 2 , . . . , X n , 
though known, is given in terms of two parameters Si and S 2 , which are 
unknown. It is desired to make an estimate of one of the parameters, 
say $ 1 . 

The process of estimating Si consists in constructing two functions 
of the observations, $(E) and 5(E) and in estimating the parameter to 
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be within the interval: S(E) = [9(E), 6(E)]. It is important to point out 
certain properties of the functions 9 and 0. Since they are functions 
of the sample values, Xi, X 2 , . . . , X n , they are both random variables 
and will vary from sample to sample as the sample point X\,X 2 , . . . ,X n 
varies. Since they are random variables, the probabilities of 9 and $ lying 
within or without any specified limit may be considered. 

Denote by 6\ the true value of the parameter 0i in a particular problem. 
Then 9(E) and 6(E) should have this property: the probability that when 
0? and 0 2 are the true values of the two parameters, 9(E) is less than 0? 
and 6(E) is greater than 0$ and is equal to a; that is, 

P[9(E) <61 < 6(E) |0}, 0 2 ] = a (6.05) 

The interval extending from 9(E) to 6(E) in (6.05) is called the con - 
fidence interval corresponding to the sample point E> and the value a 
(for example, 0.95, or 0.99 . . .), the confidence coefficient. What is 
required in (6.05) is a probability of a specified value, whatever the values 
of 0i and 02 , calculable from the probability law depending on 0i and 0 2 . 
Thus the functions 9 and 0 must satisfy (6.05), also identically for all 
possible values of 0 2 . 

The meaning of the confidence interval may be said to be this: Assume 
that a large number of samples are drawn randomly from a population 
obeying the specified elementary probability law. If in each case the 
statement is made that 0° is included in the interval [9(E), 6(E)], then the 
relative frequency of correct statements will be approximately equal 
to the confidence coefficient, a . For example, take a = 0.95. If 100 
samples are taken and 100 confidence intervals set up, it may be expected 
that 95 per cent of these intervals will include or cover the true value, 
say 0j. It should be noted that this statement is not equivalent to the 
statement that the probability is 95 out of 100 that 0® lies between the 
limits 0 and 0. This discrepancy is explained by the fact that 0° is not a 
random variable but an unknown constant. Consequently, the probabil¬ 
ity of 0? falling within specified limits may be either zero or unity, depend¬ 
ing on whether the actual value of 0i falls without or within the limits. 

Further development of the theory (Ref. 24) indicates that there 
exists an infinite number of confidence intervals for a given confidence 
coefficient. Hence, some principle is needed as a basis for choosing from 
among them. One principle is to select the shortest system of intervals. 
Shortest confidence intervals, however, exist to a considerable extent only 
in exceptional cases. Other principles, such as unbiasedness, have been 
used; but even shortest unbiased confidence intervals exist in only a 
restricted class of cases. A third type of interval has been called the 
“short-unbiased” confidence interval. If there is more than one param¬ 
eter, there is not often a confidence interval for one of the parameters 
which is independent of the other parameters. With more than one 
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parameter the set of points constitutes a simple close region, if it exists, 
rather than a single interval as in the case of only one parameter. In 
the case of several parameters, new problems arise. But the description 
of the basic ideas has been given in the situation described above. 

Fiducial versus Confidence Intervals. It appears that Fisher’s theory 
of fiducial probability and Neyman’s theory of confidence intervals are 
closely related and that in a number of practical cases they may lead to 
the same form of procedure. The authors, however, indicate a disagree¬ 
ment in the logical foundations as well in certain practical applications. 
Neyman (Ref. 23) has attempted to develop a general procedure which 
will supply rules for setting up from observational data an interval 
that will cover the unknown parameter with a given probability. Fisher 
(Ref. 7) indicates that a unique probability measure associated with a 
particular interval is needed. This measure is defined as a fiducial 
probability. An essential point of agreement is in the interpretation 
that the probability of, say, 0.95 is not the probability that the parameter 
estimated lies between any fixed limits but, rather, that a variable state¬ 
ment about this parameter formulated in accordance with a specified rule 
will be correct. Fisher expresses it by stating that there is a fiducial 
probability of 95 per cent of the unknown parameter’s lying within the 
specified fiducial limits. According to Neyman, the statement would 
be made that the specified interval will cover the true value and that we 
know that the statement will be correct 95 times out of 100. 

Fisher (1935) has emphasized that a fiducial statement can be made 
only in terms of the estimate if the estimate of the unknown parameter 
has the property of sufficiency, because only in this case does the estimate 
elicit the whole of the available information. Neyman’s confidence 
intervals are apparently of more general applicability. When an estimate 
is sufficient, both the fiducial limits and the limits of Neyman’s shortest 
confidence interval or of his short unbiased confidence interval depend 
on this property of sufficiency. The interval would not, however, always 
be the same in the two cases because of the use by Neyman of an addi¬ 
tional principle in the determination of his intervals. 

It would appear, however, that the two procedures would be inter¬ 
changeable in at least the first two examples that follow. 

Problems of Interval Estimation 

Problem VI.l. Estimation of the population mean. The first prob¬ 
lem consists in estimating n, the mean of a normal population of known 
variance <r 2 , given a sample mean based on n items. 

From our study of sampling theory, we know that the means of 
random samples from a normal population, for example, the X’s, are 
normally distributed about with a standard deviation (called the 
standard error of the mean, at) equal to a/y/n. Hence, we know the 
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proportion of sample means which will lie within the interval: y. ± some 
multiple of The confidence interval may be written as 


t -y a -^< ll <X + y a 
'yn 



where y a is the value of 



(6.06) 


for a given confidence coefficient a, which can be read from a normal 
probability integral table. If a = 0.99, then y a = 2.576. If a = 0.95, 
then y a = 1.96 no matter what n is. For example, we find that 99 per 
cent of the sample means will fall within the interval y ± 2.576<rjr, and 
95 per cent within the interval y ± 1.96<7*. On the basis of sampling 
theory, if in repeated sampling we take the interval extending from a 
lower limit of X — 2.576<rj? to an upper limit of X + 2.576(rjr, then this 
interval will cover the population mean, y, in 99 per cent of cases. 

We may take as a practical illustration the 100 samples of 5 items 
e^bh drawn from the population with y = 30, a = 10. For samples of 
5 numbers <rj? will be 


<r _ 10 
y/n 


4.472 


Using a 95 per cent confidence coefficient, we take the intervals extending 
from X - (1.96)(4.472) to X + (1.96) (4.472), or from X - 8.77 to 


TABLE 31 

Confidence Intervals for the Means of 100 Random Samples of Size 5, Using 
a Confidence Coefficient of 95 per Cent 
(Population /a » 30; <r =* 10) 


14.23- 31.77 
28.03-45.57 

18.63- 36.17 
24.03-41.57 

30.23- 47.77 

23.43- 40.97 
23.03-40.57 

16.83- 34.37 

14.63- 32.17 
23.03-40.57 

22.43- 39.97 

26.23- 43.77 

20.43- 37.97 
19.03-36.57 
13.03-30.57 

13.83- 31.37 
29.03-46.57 
13.03-30.57 

15.83- 33.37 

10.63- 28.17 


29.83- 47.37 

18.23-35.77 

19.63- 37.17 

18.83- 36.37 

23.43- 40.97 

25.83- 43.37 

23.63- 41.17 

17.83- 35.37 

22.83- 40.77 
23.03-40.57 

21.03-38.57 

21.83- 39.37 

22.43- 39.97 

17.63- 35.17 

18.23- 35.77 

16.83- 34.37 

19.23- 36.77 

15.23- 32.77 

22.63- 40.17 

26.23- 43.77 


12.83- 30.37 

19.83- 37.37 
21.03-38.57 
22.03-39.57 

18.43- 35.97 

24.43- 41.97 

21.83- 39.37 

28.83- 46.37 

23.23-40.77 

19.43- 36.97 

26.03-43.57 

23.23- 40.77 

25.83- 43.37 

29.23- 46.77 

17.63- 35.17 

20.43- 37.97 
25.03-42.57 

15.83- 33.37 

15.43- 32.97 

29.63- 47.17 


25.03-42.57 

20.23- 37.77 

23.43- 40.97 

31.83- 49.37 
21.03-38.57 

13.83- 31.37 
20.03-37.57 

22.63- 40.17 

23.83- 41.37 

17.23- 34.77 

20.83- 38.37 

14.83- 32.37 
23.03-40.57 

21.63- 39.17 

19.43- 36.97 

23.63- 41.17 

13.63- 31.17 
30.03-47.57 

28.83- 46.37 

22.63- 40.17 


24.83- 42.37 
15.03-32.57 

19.23- 36.77 

23.23- 40.77 
15.03-32.57 
27.03-44.57 
24.03-41.57 

16.83- 34.37 

23.43- 40.97 

17.23- 34.77 

14.43- 31.97 
19.68-37.17 

22.23- 39.77 

18.43- 35.97 

12.83- 30.37 
29.08-46.57 
15.03-32.57 

20.63-38.17 

20.43- 37.97 

23.83- 41.37 
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X + 8.77. We calculated these intervals for the 100 sample means 
given in Table 3, page 34. They are recorded in Table 31. It is 
noted that only one of the 100 intervals calculated, namely, 10.63 — 28.17, 
does not include the population mean 30. 

The sampling experiment was repeated by taking random samples of 
size 50 instead of 5. The means of 100 samples of 50 items each were 
calculated. Again, the intervals were set up by using a confidence 
coefficient of 95 per cent, which in this case extended from X — (1.96) 
(1.4142) tol+ (1.96) (1.4142), or from X - 2.77 to X + 2.77. We 
found that the population mean 30 was covered in 97 of the 100 cases. 

An extension of the sampling experiment was made to obtain the 
means of 100 samples of 100 items each. The confidence intervals with a 
confidence coefficient of 95 per cent were calculated again, given by the 
limits X — 1.96 and X + 1-96. We noted that the population mean 30 
was covered in 96 of the 100 cases. 

In all three of the sampling experiments, therefore, there was a close 
agreement between theory and observation. We noted also that the 
confidence intervals become shorter as the size of the sample is increased. 
Therefore, the larger the sample, the more accurately can the true or 
population value be estimated. 

Problem VI.2. Estimation of the population mean of a normal popu¬ 
lation of unknown variance. Nearly always, in experimental work, 
neither the mean nor standard deviation of the population from which 
we are sampling is known. In estimating the population mean in such 
cases, we have to use the mean and standard deviation of the sample 
and the distribution of t. We shall calculate the fiducial values according 
to Fisher (Ref. 5, pp. 195-198). 

A fundamental principle in the use of the ^-distribution for the solu¬ 
tion of this problem is: If an estimate of a parameter is normally dis¬ 
tributed with a variance which can be estimated from the sample and the 
distribution of which is independent of the estimate of the parameter, then 
fiducial limits can be calculated from “Student's” ratio. 

The following are the characteristics of t which give it its unique 
utility for the solution of this type of problem: 

(a) The distribution of t is known with exactitude, without any sup¬ 
plementary assumptions or approximations. 

(b) t i s given by the single unknown parameter* ju, and by observable 
statistics only. 

(c) The statistics involved in the quantity t are sufficient. 

The quantity t is expressed by: 

l _ X - m _ \/n (X — m) 

8 


8 


(6.07) 
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where 


s = 


4 


Z(X - X) 
71—1 


2 


It is noted that since all terms on the right-hand side of (6.07) except n 
are observable, the fiducial values of n are determinable when values of t 
appropriate to any chosen level of significance, €, have been chosen. 
Furthermore, X and s 2 are independently distributed; and the two 
quantities, the sum and the sum of squares, calculated from the data 
are sufficient statistics, since they contain all the relevant information 
concerning the mean and variance of the hypothetical normal curve. 
Therefore, we may write 


M 


= X ± t t 


(6.08) 


as the corresponding fiducial limits for the value of /x. With respect to m, 
it may then be said that the fiducial probability is (1 — e) that it will lie 
within these fiducial limits. 

As a practical illustration, we may set up the fiducial limits of the true 
mean difference based on the data from the controlled experiment given 
in Problem V.6, page 75, in which the null hypothesis was rejected at the 
5 per cent level. 

The following quantities were obtained: 

X = 9.28 
n = 25 

2(X - X) 2 = 6809.04 

_ 2(* - X)* 

S n - 1 


G809.04 

24 


= 283.71 


We wish to set up fiducial limits with a fiducial probability of 95. 
Accordingly, 

t t — t, OS 

±£.08 = ±2.064 (for n — 1 = 24) 

and 



= 9.28 ± (2.064) (3.368) 
= 2.33 or 16.23 


(6.09) 


With respect to the population mean n, it may be said that it has a 
fiducial probability of 2.5 per cent of being less than 2.33 or of being 
greater than 16.23, and, in the same sense, a probability of 95 per cent of 
lying within these fiducial limits. 

Problem VL3. Estimation of the population variance from the sam¬ 
ple value. If the experimenter is interested in determining whether the 
variance or the standard deviation of a normal distribution could exceed 
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a given value or could lie in a given range, a test of significance is needed 
for which the pivotal quantity should possess the following characteristics: 

(a) Its exact sampling distribution must be known. 

(b) It must be expressible in terms of the unknown variance, 0, of the 
distribution sampled, together with known statistics only. 

(c) The statistics involved in the expression of the quantity must be 
sufficient. 


It is known that —— - is distributed as is x 2 for n — 1 degrees 
0 

X 2 

of freedom. That is, if n _ ^ rat i° °f the estimate of the population 

variance as obtained from the sample for n — 1 degrees of freedom to the 
true variance, 0, then x 2 is distributed, independently of the population, 
mean and variance, in a distribution determinable from the number of 
degrees of freedom (Ref. 6). 

The upper and lower hundred e per cent fiducial limits of 0 can be 
obtained from tables of the x 2 -distribution. If the two critical values of 
X 2 are represented by x? and xl, the fiducial range of 0 will be the interval 

T (n - 1 )s 2 (n - l)s 2 ] 

L x! x* J 


As our practical illustration, we set up the fiducial limits of the 
variance of the distribution based on the data in Problem V.6. If we 
take c = .05 as the probable lower limit of the value of 0 for n — 1 = 24, 
x 2 is less than 36.415 in only 5 per cent of trials (see Table III, Appendix). 

Substituting this value of x 2 in the equation 


We have 


. = S(X - *)» ' 
* <t> 

_ 6809.04 

. _ 6809.04 
* 36.415 

= 186.98 


( 6 . 10 ) 


Similarly, the probable upper limit to the value of <t> is obtainable by 
first noting that x 2 for n — 1 degrees of freedom exceeds the value 13.848 
in only 5 per cent of trials. Substituting this value for x 2 in Equation 
( 6 . 10 ), 

_ 6809.04 
4 > 

_ 6809.04 
13.848 
= 491.70 


we get 
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We may say, then, that the fiducial probability is 5 per cent that the 
variance should exceed 491.7 or be less than 186.98, and, in the same sense, 
a fiducial probability of 90 per cent of the variance lying within these 
fiducial limits. If a linear measure of variation is wanted, the correspond¬ 
ing fiducial limits for the population standard deviation are 22.17 and 
13.64, respectively. 

Problem VI.4. Estimation of an individual’s true score from his ob¬ 
tained score on a test. We assume that the scores an individual would 
obtain on a very large number of equivalent tests are distributed in a 
normal manner about his true score with a standar d devia tion equal to the 
standard error of an individual score, <r x = s y/1 — r, where s is the 
standard deviation of the distribution of scores and r is the reliability 
coefficient of the test. The upper and lower limits of the confidence 
interval of his true score, £, are given by 

X ± Y a (s vT^7) (6.11) 

where Y a is the value of y = (X — £)/<r x for a given confidence coefficient, 
a, which is read from the normal probability integral table; X is the 
obtained score; and <r x is the standard error of X. 

As an illustration, let us set up the confidence interval for the true 
score of a pupil who receives an I.Q. rating of 105 on a particular intelli¬ 
gence test on which the standard error of an individual score is 4 I.Q. 
points. Using a confidence coefficient of 98 per cent, the upper and 
lower limits of the confidence interval are, respectively, 105 + (2.326) (4) 
= 114.3 and 105 — (2.326) (4) = 95.7. We then state that the interval 
(95.7, 114.3) will cover the true I.Q. score of this individual, and we know 
that our statement concerning the true score, £, will be correct in 98 per 
cent of such cases. 

Problem VL5. Estimation of the confidence interval for the popula¬ 
tion median in samples from any continuous population. We have con¬ 
sidered the sampling distributions of certain statistics calculated from 
random samples involving only one of the unknown parameters specifying 
a parent population of known form. The method of interval estimation 
was used to set up in terms of the observations at any level the confidence 
interval for the unknown population parameter. 

Thompson (Ref. 30) and Savur (Ref. 29) independently obtained the 
confidence interval for the median without reference to the form of parent 
population. Cases arise in which the population form is unknown or, as 
in small samples, in which it is not easy to test an assumption of normal¬ 
ity. Here the interval estimation of the median as a measure of location 
is especially useful. Nair (Ref. 22) used the results of Thompson, 
restricted to continuous populations, to construct a table of confidence 
intervals for the median, the use of which makes the problem of estimation 
extremely simple. 
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In a random sample of n observations Xi, X t , . . . , X k , . . . , X„ 
arranged in ascending order of magnitude, if P* is the probability integral 
of X k , then 

P(X < X k ) = P(P < Pk ) = fp P k ~\ 1 - Py- k dj> = h- P (n — k + l,k) 

(6.12) 

where Ix(P,q) is the function tabulated in the Incomplete Beta Function 
Table. By definition, the probability integral corresponding to the 
median, M, is £. Therefore, 

P(M < Xk) = P(i < Pk) = /«..(» — k + 1 ,k) 

Also, P(M < Xk) = P(M > X n - k+ i) 

Hence, P(X* < M < X n - k +\) = 1 — 2/ 0 .6(ft — k + 1, k) (6.13) 

which is the confidence interval of the population median. It states that 
the unknown population median will lie in the interval extending from the 
fcth to the (ft — k + l)th observation in 100 [1 — 2Jo.6(» — k + 1, &)] 
per cent of the cases. 

With the aid of the Incomplete Beta Function Tables, Nair (Ref. 22) 
prepared the Table of Confidence Intervals for the Median for values of n 
from 6 through 81, for confidence coefficients of 0.95 and 0.99. This 
table consists in finding k such that, given n, 

Io. 6 (n - k + 1, k) = 0.025 or 0.005 

Since k can have only integral values, the confidence coefficient cannot 
be fixed exactly at 0.95 or 0.99 for all values of ft. Values of k are taken, 
which bring the confidence coefficient 

I — 2 /o.6 (ft — k + 1, k) 

nearest to (and greater than) the conventional values of 0.95 or 0.99. 

For values of n larger than 81, Nair (Ref. 22) suggests the use of the 
normal curve as an approximation where x, the relative deviate, is given 
by 



For a given confidence coefficient, such as 0.95 or 0.99, the correspond¬ 
ing value of x can be obtained from the Normal table, and the value of k 
can be determined from the relation 
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As an illustration of the use of Nair’s Tables, we shall set up the con¬ 
fidence interval for the population median, M y using the sample data 
given in Problem V.6, page 75. The sample median, Md , of the indi¬ 
vidual pair differences is 10; n = 25. 

We enter Nair’s Table given in (Ref. 22) with n — 25. The argu¬ 
ments and values for n = 25 given in the table are as follows: 


Confidence coefficient ^ 0.95 

Confidence coefficient ^ 0.99 

n k n — k + 1 
25 8 18 

i 

/o.fi(n — k + 1, k) 
.0216 

k n — k + 1 Io.&(n — k + 1 , &) n 

6 20 . 0020 25 


For a sample of 25 observations, we can say that, with a confidence 
coefficient of 95.68 per cent [100 — 2(.0216)], the population median M 
will lie between the eighth and eighteenth ranked observations, that is, 
between 17 and 3. We can also say that, with a confidence coefficient of 
99.6 per cent, the population median, M, will lie between the sixth and 
twentieth ranked observation, that is, between 24 and —6. 

Problem VLB. Setting up the confidence interval on a population 
difference from a given sample difference in percentages. If percent¬ 
ages are obtained within a sample, that is, the percentages of 11 yes” and 
“no” answers to a given question, the problem arises of how to get 
confidence limits on a population difference, d, for a given sample differ¬ 
ence, d. Wilks (Ref. 32) gives the 99 per cent sampling limits of d as 

d ± Vl00(Pi + P t ) - d 2 (6.16) 

That is, in drawing repeatedly random samples of size n from a population 
in which the “yes” and “no” percentages are Pi and P 2 , respectively, 
approximately 99 per cent of the samples have a difference d which lies 
between these two limits. In practice, sample values d, P 1 , and P 2 are 
substituted for the unavailable d, Pi, and P 2 . This procedure may be 
satisfactory for practical purposes. 

Wilks gives the quantity 258 /y/n as a simple, conservative critical 
value of he sample difference d . If +d is larger than 258 /y/n } the 
probability is at least 0.99 that d, the population difference, would be 
included between two positive confidence limits. The more common 
interpretation would be that at the 1 per cent level of significance a true 
difference d between the “yes” and “no” percentages exists in the 
population. 

As an illustrative problem, let us test the significance of the difference 
between the two proportions in a random sample of 77 male graduates 
of a teachers* college, 22 of whom remained in the teaching profession 
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and 55 of whom left it within 10 years after graduation. The approxi¬ 
mate test of significance given by the pivotal quantity 258/Vn shows 
that d, or 42.8 per cent, > 258/\/77 > 29.4 and hence significant at the 
1 per cent level. The 99 per cent confidence interval is given by sub¬ 
stituting the sample values for d, Pi, and P 2 in (6.16). Thus: 

d ± \/lOO(71.4 + 28.6) - (42.8) 2 = 42.8 ± 26.5 

V77 


or the confidence interval of the population difference, d, with a confidence 
coefficient of 99 per cent is (16.3, 69.3). 

Problem VL7. Setting up a confidence interval of a population dif¬ 
ference from the difference between two sample percentages. The 
problem of comparing two percentages in different random samples differs 
from that in Problem VI.6 in that in the latter there is a negative correla¬ 
tion between the percentages of “yes” and “no” answers. No correla¬ 
tion exists in the percentages in the two different samples. Wilks (Ref. 
32) gives 99 per cent sampling limits of d as 


d ± 2 


:. 58 ^ 


(100 - Pi) P 2 (100 - Pi) 


ni n 2 

and the corresponding conservative critical limit for 5 as 


(6.17) 


129 J (ni t (6.18) 

\ 71x712 

If instead of calculating d, the difference between the percentages Pi 
and P 2 , we first transform the percentages to the inverse sine function 
(see page 164), then 

i’ = 100 (sin- 1 - sin-' (6.19) 

Then 129 — is, to a close approximation at the 1 per cent level, 

an exact critical limit of d'. 

As an illustration, we shall set up the 99 per cent confidence interval 
for the population difference from the two samples of percentages of 
color-blindness in the two sexes of the Caucasoid population (see Problem 
V.8, page 80). 

From the data, 

Pi = 8.4, P 2 = 1.3, d = 7.1 

The 99 per cent confidence interval is obtained by substituting the sample 
values Pi, P 2 , and d in (6.17): 

, ± 2 . 58 jsmz) + . r.i ± 3.18 
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Therefore, the 99 per cent confidence interval of the population difference 
is (3.9, 10.3). 

Again using Formula (6.19), we get 1 


and 


3,' = 100 (sin -1 \/-084 — sin -1 \/.013) 
= 100(16.8 - 6.4) = 1040 


129 = 129 


niU2 


4 


1025 

183,976 


= 9.6 


Since 1040 > 9.5, the probability is at least 0.99 that a genuine difference 
between the percentages exists in the population. 

Problem VI.8. Setting up the confidence limits for an individual 
estimated score. In problems of estimating or predicting a measure of a 
characteristic from a knowledge of one or more other characteristics, the 
predicted values are subject to error. Here we can use the confidence 
interval to show the accuracy of individual estimates and the confidence 
that may be placed in the statements made about individuals. We 
shall take the case of simple regression, that is, the prediction of one 
characteristic from a knowledge of another. 2 The data are from Problem 
V.13, page 88, and we shall set up the confidence interval for each of the 
individual's estimated score from the regression equation, using a con¬ 
fidence coefficient of 98 per cent. The basic calculations are given in 
Table 32. 

The standard error of the estimate Y E for a particular value of X, say 
X 0 , is given by 


Syb — 


( 4(1 - rj) r. , (A„-A) 4 1|» 

1 N-2 L 4 J 


( 6 . 20 ) 


where sr K denotes the standard error of Y E , N is the number of pairs of 
observations, and the other quantities have their customary meanings. 

From the formula, it is noted that the errors of the estimates of Y 
increase as the quantity X 0 departs from the mean of the A-distribution; 
also that as the values of r and sx become larger, the smaller become the 
errors of estimation, other factors being equal. 

From Problem V.13, we record the following values: 


Ye 

4 

4 

r 4 

* 

N 


.9873A - 0.68 
157.30 
172.59 
0.8885 
53.24 
25 


1 Transformation obtained from Fisher and Yates’s Table XII, page 56 of Ref. 13 
in Chapter VII. 

1 For the multivariate case see page 343. 
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TABLE 32 

Standard Errors of Estimated Values of Y for Different Values of Xo with 
Corresponding 98 per Cent Confidence Intervals 


Indi¬ 

vidual 

1 


Yx 

Sym 

t.oiSya 

Interval 

Independent variables 
arranged in descending 
order of magnitude 

Xo 

Syb 


a) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

(10) 

1 

46 

52 

44.74 

1.056 

2.64 

42.10-47.38 

73 

1.707 

4.27 

2 

38 

38 

36.84 

1.439 

3.60 

33.24-40.44 

73 

1.707 

4.27 

3 

64 

63 

62.51 

1.205 

3.01 

59.50-65.52 

67 

1.357 

3.39 

4 

73 

65 

71.39 

1.707 

4.27 

67.12-75.66 

66 

1.305 

3.26 

5 

61 

58 

59.55 

1.075 

2.69 

56.86-62.24 

66 

1.305 

3.26 

6 

34 

33 

32.89 

1.675 

4.19 

28.70-37.08 

64 

1.205 

3.01 

7 

57 

49 

55.60 

0.955 

2.39 

53.21-57.99 

64 

1.205 

3.01 

8 

66 

63 

64.48 

1.305 

3.26 

61.22-67.74 

61 

1.075 

2.69 

9 

25 

24 

24.00 

2.255 

5.64 

18.36-29.64 

61 

1.075 

2.69 

10 

30 

26 

28.94 

1.926 

4.81 

24.13-33.75 

59 

1.006 

2.51 

11 

45 

33 

43.75 

1.094 

2.73 

41.02-46.48 

57 

0.955 

2.39 

12 

73 

71 

71.39 

1.707 

4.27 

67.12-75.66 

55 

0.923 

2.31 

13 

45 

43 

43.75 

1.094 

2.73 

41.02-46.48 

55 

0.923 

2.31 

14 

55 

63 

53.62 

0.923 

2.31 

51.31-55.93 

52 

0.919 

2.30 

15 

66 

70 

64.48 

1.305 

3.26 

61.22-67.74 

51 

0.929 

2.32 

16 

49 

46 

47.70 

0.965 

2.41 

45.29-50.11 

50 

0.944 

2.36 

17 

64 

65 

62.51 

1.205 

3.01 

59.50-65.52 

49 

0.965 

2.41 

18 

45 

46 

43.75 

1.094 

2.73 

41.02-46.48 

46 

1.056 

2.64 

19 

61 

62 

59.55 

1.075 

2.69 

56.86-62.24 

45 

1.094 

2.73 

20 

52 

46 

50.66 

0.919 

2.30 

48.36-52.96 

45 

1.094 

2.73 

21 

67 

68 

65.47 

1.357 

3.39 

62.08-68.86 

45 

1.094 

2.73 

22 

59 

53 

57.57 

1.006 

2.51 

55.06-60.08 

38 

1.439 

3.60 

23 

55 

55 

53.62 

0.923 

2.31 

51.31-55.93 

34 

1.675 

4.19 

24 

51 

52 

49.67 

0.929 

2.32 

47.35-51.99 

30 

1.926 

4.81 

25 



48.68 

0.944 

2.36 

46.32-51.04 

25 

2.255 

5.64 


Substituting these values in Equation (6.20) and using the X 0 for 
each of the 25 individuals, we obtain the Sr B for each individual. These 
values are recorded in column (5), Table 32. Using the confidence 
coefficient of 98 per cent, we find from the stable that the value of <.02 
for » = N — 2 = 23 is 2.5. Therefore, for any particular value of Xo 
the upper and lower limits of the confidence interval will be Yu + 2.5sr, 
and Yx — 2.5s r *. The values of 2.5sy a are given in column (6), and the 
values for the confidence interval, in column (7). 

In column (10) the values of i.ojsr* have been recorded for values’of 
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Xo [column (8)] arranged in descending order of magnitude. It is clear 
from column (9) that the errors of estimation increase considerably as 
the value of X 0 recedes from the mean of the distribution of X. Cor¬ 
respondingly, the confidence intervals widen and reflect the increase 
in the errors of estimation. 


Confidence Limits and Tolerance Limits 

A distinction should be made between confidence limits and tolerance 
limits. The latter has proved to be a useful statistical concept in the 
quality control of manufacturing products and probably can be applied to 
other fields. 

The problem in setting up tolerance limits is that of determining 
limits from the sample information which will include, on the average, a 
specified proportion of the universe or population between them (Ref. 33). 

For an example, let X n be the sample mean and s 2 = 

the sample variance estimate in a sample of size n. The tolerance 
limits L[ and L 2 , which between them will include, on the average, a 
proportion a of the universe, are given by 

X±t ay J^p-s ( 6 . 21 ) 

The value of t a can be obtained from the table of the ^-distribution, 
when the value of a has been specified, for example as 99 per cent, 95 per 
cent, or whatever, and n — 1 is the number of degrees of freedom. In 
contrast, the confidence limits X ± t a sx may be said to include the 
population mean, p, with a confidence coefficient of a. 


I 


(Xi - X)* 
(n - 1) 


The Method of Maximum Likelihood 

We shall illustrate the method of maximum likelihood for determining 
the best estimates of the population values by applying the method to the 
derivation of the estimates of the five parameters required to specify 
a normal correlation surface (Refs. 28 and 19). The five parameters 
are the means of the two normal distributions of the variates X and F, 
say fx and £, respectively; the two standard deviations, <r x and <r Y ; and p, 
the correlation between X and F. It is assumed that the regression 
both ways (X on Y or Y on X) is linear and that the variables X and Y 
are normally distributed. 

If the variables X and F are normally distributed, then the probability 
distributions of X , F, and XY, namely, P(X), P(F), and P(X,F), are 


P(X) = 


1 

<Tx \Z2 t 


e 


2<r x * 


( 6 . 22 ) 
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i (r-a» 

P(F) = —' e (6.23) 

<rr \/ 2 tt 

i i r 2 P (x- M )(r-a ■ (F-e» i 

P(X,Y) = - - 0 2(1—p*)L <r x > “*■ <r y * J (6.24) 

2iro-x^r yl-p 2 

With iV' pairs of values, the simultaneous probability distribution of all 
the N values of X and Y is 


P (x„...,x»,r„...,r»)- ( Wr ^ /r _ i y 


e 2 ( 1 - P )» 


si...] 


■(X-jOf _ (X - mKF - Q (F - £) 2 ] 
<TX<Tr Cy J 


(6.25) 


To obtain the maximum likelihood estimates, the process consists in 
taking the partial derivatives of the probability functions with respect 
to the parameters, setting the resulting equations equal to zero, and 
then solving the simultaneous linear equations for the parameters. 
Here, then, the procedure consists in taking the partial derivatives of 
P(X i, . . . , Xf,, Yi, , Y n ) with respect to p, £, <r x , or, and p. It is 
convenient to work with the natural logarithms. Hence, for (6.25) we 
have 


log.P = — JVlog,2ir — N log, ax ~ iVIog,<r y — log. (1 — p 2 ) 

1 V [ (X - p) 2 _ 2p(X -p)(F- {) (F - a 2 ] 
2(1 - p 2 ) Z/ L 0 * oxar <r\ J 


(6.26) 


Then log P is differentiated with respect to p and the equation is set equal 
to zero, giving 


2 2 P 2(F-£) 

«p U 2(1 - p 2 ) <r% 2(1 - p 2 ) axa r 

From which ay 2(X — p) = p<rx2(F — £) 


(6.27) 

(6.28) 


Likewise, differentiating log P with respect to £, setting the derivative 
equal to zero, and reducing, we get 

<rx2(F - £) - pay2(X - p) (6.29) 


Assuming p ^ 1, «x 0, 0, we get by solving equations (6.28) and 

(6.29) the optimum estimates: 


p = m * (6.30) 

* - if z* f (6.31) 


Similarly, we may differentiate log p partially with respect to a x , <rr, 
and p, respectively; set the equations equal to zero; solve; and substitute 
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the values given for n and £ in (6.30) and (6.31); obtaining 


$ 

H 

II 

JX1 

* 

to 

1 

/—N 

to 

1-1 

III 

S’ 

(6.32) 


(6.33) 


XY - 


or 


P = 


P = 


(SgXSjQ 

N 


NXXY - (ZZ)(2F) 
VW2Z 2 - (2X) 2 ][iV2F 2 - (2F) 2 ] 


s r 




(6.34) 


Jackson and Ferguson (Ref. 19) have shown that the maximum likeli¬ 
hood estimate of p in the case of samples from a population specified by 
four parameters— a x = a Y = a; p; £; p—is 


2 

[ 2 > 

(XX)(XY)1 

N J 


_D*’- 

(XX) 2 ' 

N J 

+[!/'- 

(XY) 2 ] 

N J 


(6.35) 


This is the case in determining the reliability coefficient of a test by the 
test-retest and alternative equivalent forms methods. 

In the same way, the maximum likelihood estimate of p obtained from 
samples of a population specified by three parameters— a x = a Y = a; 
M = £; P—is 


P = 




% x ’ + 




(XX + XY) 2 
2 N 


(6.36) 


This is the case in determining the reliability coefficient of a test by the 
split-half method. 


Estimating the Reliability of Tests 
The reliability of measurements is a fundamental tenet in all observa¬ 
tional and experimental sciences. The problem of the reliability of 
instruments of measurement has, however, received the greatest consid¬ 
eration in psychology, education, and sociology. 

The traditional method of determining the reliability of a test is 
through the use of the product-moment correlation coefficient. The term 
“reliability of a test” as introduced by Spearman in 1910 was defined 
as the (correlation) “coefficient between one half and the other half of 
several measurements of the same thing.” (Ref. 19.) 
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Until recently, the only methods available for measuring the so-called 
“reliability of a test” were (1) the test-retest method—doing the same 
test twice; (2) obtaining the correlation between the scores on equivalent 
forms of the test; (3) the split-test method—consisting in obtaining the 
correlation coefficient between the scores on the odd and even items of 
the test. This correlation gives an estimate of the reliability of each 
half of the test. To obtain the reliability of the whole test, application 
is made of the Spearman-Brown formula. 

Recently, other approaches to the problem of obtaining reliability 
have been made. A number of methods, both the traditional and the 
more recent, will be discussed and illustrated in the following pages. 

Problem VI.9. Comparison of the split-test and the maximum likeli¬ 
hood methods. We shall compare the results from determining the 
reliability of a test by the split-test method, using the product-moment, 

TABLE 33 

The Scores of a Random Sample of 25 
Students on a Biology Test 


Indi¬ 

vidual 

Odd, 

X 

Even, 

Y 

1 

227 

226 

2 

124 

111 

3 

210 

237 

4 

178 

161 

5 

192 

188 

6 

104 

93 

7 

191 

201 

8 

148 

168 

9 

125 

123 

10 

141 

157 

11 

171 

178 

12 

168 

182 

13 

129 

118 

14 

192 

222 

15 

176 

171 

16 

172 

180 

17 

215 

224 

18 

102 

144 

19 

177 

176 

20 

109 

125 

21 

146 

150 

22 

180 

184 

23 

179 

193 

24 

141 

131 

25 

141 

135 

Total 

4038 

4178 
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correlation coefficient, with the results obtained from applying the 
maximum likelihood method. The comparison was made on a test in 
biology from which the scores of a random sample of 25 students are 
listed in Table 33. 

Before applying the split-test method, it was necessary to test the 
underlying assumptions, namely, that the means on the two halves of the 
test are equal and that the standard deviations are equal. The t-test 
for the former (<o = 1.917) and the F-test for the latter (F 0 = 1.238) 
give probability values P > .05. Therefore we may consider the assump¬ 
tions satisfied and proceed to determine the correlation between the two 
halves of the test by calculating the product-moment correlation coeffi¬ 
cient and by getting the maximum likelihood estimate. 

The product-moment correlation coefficient between the scores on the 
two halves of the test is given by 


r = 


NXXY - (2X)(2 Y) 


VlNSA* - (2Zp][A27 2 - (2F) 2 ] 
29,812.44 


V(28,930.24) (35,816.64) 

The maximum likelihood estimate is given by 
(XX + 2F) 2 


= 0.9262 (or 0.93) 


22ZF - 


2 N 


XX 2 + 2F 2 - 


(XX + 2F) 2 
2 N 


2(704,643) - 1,350,053.12 
681,148 + 734,044 - 1,350,053.12 


= 0.9093 (or 0.91) 


Although the difference between the two estimates in this problem 
can not be said to be large, we accept the maximum likelihood estimate 
as the optimum estimate. 

Problem VI. 10. Comparison of the product-moment correlation 
coefficient and the maximum likelihood estimate for determining the 
reliability of a test by means of the equivalent forms method. The com¬ 
parison of these two methods of estimating test reliability was made on 
the scores of two forms of a reading test made by a random sample of 
30 students. The data are given in Table 34. 

It was found that the value given for the product-moment correlation 
coefficient was 0.9164 or 0.92 and the maximum likelihood estimate was 
0.9076 or 0.91. Although the difference in this problem is small, we 
accept the maximum likelihood estimate as the optimum. 

It should be observed that previous to the application of the method 
of estimation, the fundamental assumptions underlying the equivalent 
forms method of testing reliability have been tested. The assumption 
here is that the standard deviations on the scores of the two forms of the 
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TABLE 34 

The Scores on Two Forms of a Reading Test of a 
Random Sample of 30 Students 


Individual 

Form B, 
X 

Form A, 
Y 

1 

46 

39 

2 

47 

45 

3 

46 

42 

4 

27 

35 

5 

59 

53 

6 

74 

64 

7 

30 

27 

8 

50 

41 

9 

56 

50 

10 

41 

43 

11 

24 

25 

12 

27 

32 

13 

37 

34 

14 

59 

54 

15 

36 

38 

16 

42 

42 

17 

41 

45 

18 

49 

39 

19 

29 

28 

20 

57 

50 

21 

27 

26 

22 

49 

46 

23 

34 

26 

24 

14 

23 

25 

44 

50 

26 

48 

46 

27 

61 

64 

28 

70 

69 

29 

58 

49 

30 

50 

60 

N - 30 Total 

1332 

1285 


test are equal. The F-test (F 0 = 1-32) showed that this assumption was 
satisfied. 

A more stringent test of the equivalence of the two forms of the test 
can be made by applying three sample criteria proposed by Wilks (Ref. 
34) for testing the equality of means, equality of variances, and equality 
of covariances. The L mte criterion (two variables) is 

T . = _ 8n8ii ~ _ /c *>7\ 

"" [*(«n + « n ) + i(%i - *»)*]* ~ [«u - i(Xi - *s)T ( ) 
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where X\ and X 2 are the means; 8n and $22 are the variances; and s 12 is 
the covariance, between the two forms. 

Although tests of significance may be made by the use of the prepared 
tables, an exact level of significance is given by 

P = T. 

-Urn vc 

From the scores of the thirty individuals given in Table 34, we make 
the following calculations: 

Xi = 44.4 
X 2 = 42.83 

sn = *V[64,938 - (44.4) 2 ] = 193.24 
S 22 = ^[59,429 - (42.8333) 2 ] = 146.2751 
Sl2 = ^[(61,676) - (44.4000) (42.8333)] = 154.0682 
x = UXi + X 2 ) 

= *(87.23) 

= 43.62 

Substituting the values required in (6.37), we get 

L mvc = .8268 

and P = L mrc *<*- 2 >= (.8268) 14 = .07 

Therefore, we conclude that the two forms of the test are parallel or 
equivalent. 

Problem VI.11. Determining the sensitivity of a test. Jackson 
(Ref. 18) applied analysis of variance methods and the methods of testing 
statistical hypothesis to the problem of determining the reliability of a 
test . 3 He treated four different problems: the determinations of ( 1 ) the 
existence of a significant practice effect, ( 2 ) whether or not the test 
measures the capacities of the individuals tested, and the estimation of 
(3) practice effect, if it is found to exist; and (4) the relative importance 
of the random errors of measurement with respect to the true measure¬ 
ment of the capacity of the individual. Jackson introduced a new 
statistic, 7 , which he called the sensitivity of the test, defined as the ratio 
of the standard deviation of true scores to the standard deviation of the 
distribution of errors of measurement. 

The method of Jackson is applied to the scores of a random sample of 
30 students on two forms A and B of a reading test, the same data as 
were used in Problem VI. 10 . The original data and calculations are 
given in Table 35. 

It is assumed that each individual’s score on the test is the sum of a 
number of independent components and that the analysis gives a measure 
of the influence of each. One component is the difference in ability 


* The student may find it advantageous to follow through this method after he has 
studied the analysis of variance (see page 226). For the method of testing statistical 
hypothesis see page 63. 
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between the individuals tested. Noting the scores of the individual 
students in columns (2) and (3), it is observed that the students on the 
average make higher scores on form B than on form A. Form A was 
given first, so that this difference is called a measure of the “practice” 
effect. Even when allowance is made for the influence of practice effect, 


TABLE 35 

Scores of Freshman Students in the College of Agriculture on Forms A and B 

of a Reading Test 


Student 

No. 

Score on 

Sum of scores 
i + r 

Difference 
in scores, 

X - Y 

Form B, 

X 

Form A, 

Y 

a) 

(2) 

(3) 

(4) 

(5) 

1 

46 

39 

85 

7 

2 

47 

45 

92 

2 

3 

46 

42 

88 

4 

4 

27 

35 

62 

-8 

5 

59 

53 

112 

6 

6 

74 

64 

138 

10 

7 

30 

27 

57 

3 

8 

50 

41 

91 

9 

9 

56 

50 

106 

6 

10 

41 

43 

84 


11 

24 

25 

• 49 

-1 

12 

27 

32 

59 

-5 

13 

37 

34 

71 

3 

14 

59 

54 

113 

5 

15 

36 

38 

74 

-2 

16 

42 

42 

84 

0 

17 

41 

45 

86 

-4 

18 

49 

39 

88 

10 

19 

29 

28 

57 

l 

20 

57 

50 

107 

7 

21 

27 

26 

53 

1 

22 

49 

46 

95 

3 

23 

34 

26 

60 

8 

24 

14 

23 

37 

- 9 

25 

44 

50 

94 

-6 

26 

48 

46 

94 

2 

27 

61 

64 

125 

-5 

28 

70 

69 

139 

1 

29 

58 

49 

107 

9 

30 

50 

60 

110 

-10 

Sum 

1332 

1285 

2617 ! 

47 

Sum of squares 

64,938 

59,429 

247,719 

1015 
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the scores on the two forms differ considerably. It is assumed that these 
residual differences are attributable to the errors of measurement of the 
test used. Possibly other factors exist, such as possible fluctuations 
in the ability of the individual students and differences between the two 
forms. Since these factors are not isolated, they are included—if they 
exist—in the measurement of error. The method used to measure the 
effect of each of the components is that of the analysis of variance, which 
consists in breaking up the sum of squares of the deviations about the 
grand mean into parts assigned to the respective factors. In this way, 
the importance of the influence of the respective components can be 
established and conclusions can be made with respect to the value of the 
test as a measuring instrument. 

The calculations involved in the analysis of variance are as follows: 

(1) Calculate for each student the sum of the scores and the difference 
between his scores on the forms as indicated in columns (4) and (5), 
Table 35. 

(2) Calculate the sum and sum of squares of the numerical values 
in each of the columns (2), (3), (4), and (5), and record these in the two 
bottom rows of the table. Note the following checks on the calculations: 

(a) 1332 + 1285 = 2617 

(b) 1332 - 1285 = 47 

(c) 247,719 + 1015 = 2(64,938 + 59,429) 

(3) Calculate the sum of squares for each component as follows: 


(a) 

(b) 

(c) 

(d) 


For error 


■i[ 


1015 - 


(47) 2 
30 


For between individuals: 


= 470.683 


247,719 - 


(2617) 2 

30 


]- 


9714.683 


817 


For practice effect: ~ j 5=5 36. 

For total: 64,938 + 59,429 - = 10,222.183 


These values are then recorded in an analysis of variance table (Table 
36). 


TABLE 36 

Analysis of Variance of Scores of Freshmen on Two Forms of a Reading Test 


Source of variation 

Degrees 
of freedom 

ft 

Sum of 
squares 

Mean square 

Practice effect 

1 

36.817 

36.817 

Between individuals 

29 

9,714.683 

334.989 

Error 

29 

470.683 

16.230 

Total 

69 

10,222.183 
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The following applications can now be made of the results shown in 
Table 36: 

An estimate of the standard error of measurement of an individual 
score, 8b , is obtained by taking the square root of the error mean square. 

We get _ 

8b = V16.230 = 4.03 score units 


This gives a direct estimate of the absolute accuracy of the measurements. 

The next problem is to test whether there is a significant practice 
effect, that is, if it is significantly different from zero. This hypothesis 
is tested by calculating first the ratio of mean square due to practice 
effect to the mean square due to error: 


36.817 

16.230 


= 2.27 


We then refer to Snedecor’s table (Table IV, Appendix) of F with degrees 
of freedom ni = 1 and n 2 = 29. We find that the 5 per cent point of 
F is 4.18. Since the observed value of F , 2.27, is less than 4.18, we 
conclude that there is no significant practice effect. 

The next step is to find out whether the test measures sufficiently 
accurately to distinguish among the individual students. This is deter¬ 
mined by taking the ratio of the mean square between individuals to the 
error mean square. Thus: 


334.989 

16.230 


= 20.64 


Referring to Snedecor’s table with ni = n 2 = 29, we find that for n\ = 30 
and n 2 = 29, F. 0b = 1.85 and F.oi = 2.41. Therefore we conclude, since 
20.64 is greater than 2.41, that the two mean squares differ significantly 
and hence that the test measures with sufficient accuracy to distinguish 
between the individuals tested. 

The next problem is to determine the relative accuracy of measure¬ 
ment, that is, the relation between the magnitude of the errors of measure¬ 
ment and the size of the differences among individuals. This is given by 
Jackson’s measure, 7 , called the sensitivity of the test: 


7 


<T_c 

cr 


4 

where a e is the standard deviation of the distribution of ability in the 
population sample, and <r is the standard deviation of the distribution of 
errors of measurement. 

The unique estimate of y is obtained as follows: 

(a) Subtract the error mean square from the mean square between 
individuals. 
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(b) Divide the difference by twice the error mean square. 

(c) Take the square root of the quotient as an estimate of 7 . 


From the values in Table 36, we get 


(a) 

(b) 

(c) 


334.989 - 16.230 = 318.759 
318.759 

2(16.23) 9,82 

Estimated y = \/9.82 = 3.13 


The confidence interval is set up as follows: 


(a) 

(b) 

(c) 


Calculate the ratio of the mean square between individuals to the 
error mean square, denoted by F. 

Obtain the F, 06 and F. 0 i points of the distribution of F from 
Snedecor’s table. 

The lower limit of the interval, using F, 0 i for example, denoted by 
7 , is given by 


7 = 



(6.38) 


(d) The upper limit of the interval, 7 , using F , 0 1 for example, is 
obtained from 


7 = 



(6.39) 


(e) We may make the statement that 


7 < 7 < 7 

and the probability that the statement is correct is .98. 
For our problem, we get 


(a) 

F = 

334.989 

16.230 

20.64 

(b) 

F. 01 

= 2.42 


(c) 

7 = 

/20.64 
\ 4.84 

‘ = 1.94 

(d) 

7 = 

/49.95 - 
\ 2 

- = 4.95 

(e) 

1.94 

< 7 < 4.95 


Jackson gives the following relation between the sensitivity and reli¬ 
ability coefficient in the population: 

where p denotes the population reliability coefficient. 


(6 40) 
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From Jackson's Table XI (Ref. 18), the values of p corresponding to 
7 = 1.94 and y = 4.95 are approximately .80 and .96, respectively. 
The true values of 7 and of p are, of course, unknown. 

Problem VL12. Determination of the reliability coefficient by means 
of the analysis of variance. Hoyt (Ref. 17) developed a formula for 
estimating the reliability of a test also by means of the method of analysis 
of variance. The data used in the calculation are the number of correct 
responses to each item and the score on the test for each individual. The 
total sum of squares is broken down into three components: ( 1 ) between 
individuals, (2) between items, and (3) residual component or error. 

By subtracting the sum of the sum of squares among individuals 
and among items from the total, the residual sum of squares is used to 
estimate the discrepancy between the obtained and the true variance. 

The necessary calculations are developed as follows. First, set up a 
table for tabulation of the required data as shown in Table 37; where 


TABLE 37 

Tabulation of Data Necessary for Determining Reliability by Hoyt’s Method 


Individual 

Item 

Score 

1 2. k 

\ 

1 

Xsi . X.i 


2 


jZXs 2 

n 

X ti . Xsi 

I*- 

* 

Total 

. . . £2:* 

» » 

11* 

0 i 


s = 1 , 2 , •••,&;<=* 1 , 2 , • • • , n; k denotes the number of items; n 
is the number of individuals; X ai denotes the score of the t’th individual 
on the 5 th item, which is presumably 1 or 0 . 

Let us define: 


Grand mean } X 



where N = kn. 


i x - 

Mean of columns , X,., = -- 


Mean of rows, Jt. ( , — — 


J 
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The sum of squares between items is 

Z<2*-Y <n*‘Y 

a % _ * i 

n N 

The sum of squares between individuals is 



(6.41) 



X ..) 2 


m*«y (i i x “Y 


N 


Since X ti = 1 or 0, 


X.i = X 2 ti 


and the total sum of squares is 


(6.42) 



II X -( N ~ZI X “) 

a i 8 i 

N 


UiU2 

~N~ 


(6.43) 


where ri\ = ^ ^ X ai , 


or the number of correct responses of all individuals 


on all the items, and n 2 is the number of incorrect responses. 

We shall apply this method to an examination in college mathematics 
consisting of 80 items (h = 80) for a class of 119 students (n = 119). 
The calculations—only summary values—are 
(1) The sum of squares between individuals: 


2G*~)‘ (2I X -‘Y 

% a _ * i 

k N 

338,042 (6216) 2 

80 9520 


166.8426 


(2) The sum of squares between items: 


IW (ZZ x -y 

a i 8 i 

n N 

528.634 (6216) 2 

119 9520 


= 383.6201 


(3) The total sum of squares: 

nm* _ (6216) (9520 - 6216) _ 01 „ 01 ^ 
N 9520 2157.3176 
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These values are then recorded in Table 38, Analysis of Variance. 

TABLE 38 

Analysis op Variance op the Scores on a Test in College Mathematics 


Source of 
variation 

D.F. 

Sum of 
squares 

Mean 

squares 

F 

Hypothesis 

tested 

Between \ 

individuals J. 

Between \ 

118 

79 

9322 

9519 

166.8426 

383.6201 

1606.8549 

2157.3176 

1.4139 (a) 

4.8560 (b) 
0.1124 (c) 

8 . 20 ® 

28.17® 

Reject 

Reject 

items J. 

Residual. 

Total. 



F ® 


(a) 
(c) 

(b) 

(c) 


1.4139 

0.1724 

4.8560 

0.1724 


« 8.20 
= 28.17 


~P < .01 
~P < .01 


The following uses can be made of the results in the table: 

(1) To test the hypothesis that there is no difference between the 
means of individuals. We calculate the ratio of the mean square due to 

1 4139 

individuals to the mean square of residual: F = = 

then refer to Snedecor’s table of F (Table IV, Appendix) with degrees 
of freedom n 1 = 118 and n 2 = 9322. We find that the 1 per cent point 
of F for ni — 100 and n 2 = 00 is 1.36. We could interpolate to get the 
value for n\ = 118 and n 2 = 9322, but this operation is unnecessary, since 
it is obvious that the obtained value of F will be much greater than the 
table value. Therefore, we conclude that the test measures sufficiently 
accurately to differentiate among individuals. 

(2) To estimate the precision with which the test measures, we may 
compute the reliability coefficient, r tt , as follows: 


Ttt = 



1.4139 - 0.1724 
1.4139 


0.88 


(6.44) 


A measure of the absolute accuracy of the test is given by the standard 
error of measurement of an individual score, s Ef where 


8b 


\DJ\ 

-r 


residual s.s. 


between individuals 


1606.8549 


= 3.68 score units 


Problem VI.13. Determination of the reliability of the test by the 
method of rational equivalence. Kuder and Richardson developed a 
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method of determining the reliability of a test, which they called the 
method of rational equivalence (Ref. 20). The term “ rational equiv¬ 
alence” arises from the conception of a given test as being equivalent to a 
hypothetical parallel form where every item on the one form is inter¬ 
changeable with the corresponding item on the other and thus where 
each pair of items is equivalent with respect to content and difficulty. 
Furthermore, it is assumed that all corresponding correlations among the 
items are equal. 

A number of formulas representing varying degrees of rigor are 
presented. Only the one represented for general use is given here [Ref. 
20, Formula (20)]: 


„ __ n <j\ - npq 
“ n - 1 <rf 


(6.45) 


where r tt is the reliability coefficient; n, the number of items; <r 2 t , the 
variance of the test items; and pq, the mean variance of the items. 

Jackson and Ferguson (Ref. 19) point out that the derivation of 
Formula (6.45) can be made on the basis of the equivalence assumption 
only. We present their derivation: 

The variance of a test of n items, as a function of the item variances 
and interitem covariances, is 



TijSiSj 

a < j) 


= + n(n — VjTijSiSj 


(6.45a) 


where 4 = variance of the test 
4 = variance of item i 
» 4 = variance of the item j 
Tij = correlation between items i and j 
5? = average item variance 
TijSiSj = average item covariance 
n = number of items 

Assuming the existence of a hypothetical parallel form of the test, also 
of n items, the variance of the sum of these two tests is 


si = 2n£f + 2n(2n — 1 (6.46) 

where s 2 = variance of the sum of scores on the two equivalent forms 
of the test. 

It is known from the correlation of sums that 

4 = 2s 2 (l + r u ) (6.47) 

where r tt = correlation between the test and its hypothetical equivalent, 
or the reliability coefficient. 



138 ESTIMATION OF POPULATION PARAMETERS [Chap. VI 


When the values of sf in (6.45a) and s% in (6.46) are substituted in 
(6.47) and the equation is solved for r„, we have 


n sf — 2 sf 
r “ = —l -if - 1 


(6.48) 


which formula is identical with (6.45). 

It is to be noted that the assumption made in this derivation, that 
r^SiSj = ri'j'Si'Sj’ = r^js^s, (that is, that the covariances are on the average 
equal), is somewhat less rigorous than in the equivalence assumption. 
In the latter it is specified that r,-,- = = jy,-, and = s,», where the 

primes (') refer to the hypothetical equivalent form. 

As an illustration of this method, we present the results of the admin¬ 
istration of an Industrial Relations Classification Test of 100 test items 
to a college class of 61 students. An analysis of the scores on the test's 
gave the following values: 

Test variance, of, = 169.5067 

Average variance of the test items, pq, = .148299 

Reliability coefficient, r tt , = ~zr\ ' *** 

_ 100 169.5067 - 100(. 148299) 

99 ‘ 169 

= .92 


Formula (6.45) is not in an efficient form for calculation. 
(Ref. 16) suggests the following variant: 

n kSs -f- Si — T(T -(- k ) 
kSs - T 2 


r tt = 


Hoyt 


(6.49) 


n — 1 

where T = sum of scores of all individuals < 

Ss = sum of squares of each of the scores for all individuals 
Si = sum of squares of each of the total correct responses for all 
items 

k = number of individuals taking the test 
n = number of items in the test 


Applying (6.49) to the data from the above test, we get: 

_ 100 62(52,734) + 42,929 - 1618(1618 + 62) 
r,t 99 ' 62(52,734) - (1618) 2 

= .92 

Degrees op Freedom 

We have used the concept “degrees of freedom” a number of times 
without defining it. Since it is such a fundamental concept in statistics, 
we shall try to add to an understanding of it by referring for its interpre¬ 
tation to three analogous settings—physics, geometry, and statistics. 



Chap. VI] ESTIMATION OF POPULATION PARAMETERS 139 


Physical Interpretation. A rigid body which can move about in 
space without changing the direction of any line in it is said to have a 
motion of translation . It can also turn about any point, say P, without 
the position of P changing—a motion known as a motion of rotation 
about P. It can again have a motion compounded of a motion of trans¬ 
lation and one of rotation. 

Take any convenient frame of reference, 0(X h X 2 , X 3 ) fixed in a 
rigid body. The position of the rigid body at any instant is defined 
uniquely by the position of 0(X i, X 2 , X 3 ). We can specify the position 
of the body axes by six parameters, for example, the Cartesian coordinates 
a of 0 , with respect to fixed axes, and the three angular or polar coordi¬ 
nates <t> of 0. Therefore, the rigid body is said to have 6 degrees of 
freedom. The 6 degrees of freedom correspond to the positional coordi¬ 
nates just specified. Of course, other equivalent sets of coordinates 
may be taken. However, if a definite relation or relations are fixed or 
assigned between the six parameters or positional coordinates, then the 
rigid body is said to be subject to geometric or kinematic constraint and 
has less than 6 degrees of freedom. Each restriction reduces the number 
of degrees of freedom by 1. The fixture of one point of the body would 
constitute a constraint and reduce the degrees of freedom of the body 
by 1. Also, a point might be restricted to lie on a curved guide which in 
turn is constrained to move in a prescribed way. Sliding or rolling con¬ 
tact imposed between the body and either stable or movable guides 
represents a more general kind of constraint. The constraints may be 
represented by functional relations connecting the positional coordinates 
or parameters (Ref. 15). 

Geometric Interpretation. The geometric interpretation of degrees 
of freedom grows out of a consideration of the conceptions derived from 
the geometry of n-dimensional space. The geometrical or vectorial 
representation of a sample as a vector 4 with n orthogonal or mutually 
perpendicular components was introduced into statistics by Fisher (Ref. 
8). He carried out the first systematic investigations of the problems 
underlying the exact sampling distribution of a number of statistics and 
thus laid the basis for the solution of many theoretical problems of 
statistical distributions. 

It is well known that a one-to-one correspondence may be set up 
between all real numbers, x , and all points on a straight line. A similar 
correspondence exists between all pairs of real numbers (x h x 2 ) and all 
points in a plane; also between all triplets of real numbers (x h x% x 3 ) 
and all points in a space of three dimensions. We may, then, generalize 
by considering any system of n real numbers (xi, x 2 , , x n ) as repre¬ 

senting a point or vector x in the nth-dimensional (Euclidean) sample space, 

4 A vector is a quantity which has magnitude and direction. It is a matrix con¬ 
sisting of one single row or column. 



140 ESTIMATION OF POPULATION PARAMETERS [Chap. Vi 

V n . A point in a line has freedom of movement in one dimension; that 
is, it has 1 degree of freedom. A plane has two dimensions and a point 
on a plane has 2 degrees of freedom. Likewise, in ordinary space of three 
dimensions, a point in this space has 3 degrees of freedom. Generalizing, 
a point in n-dimensional space may be said to have n degrees of freedom. 

The numbers or values of the respective elements of a sample, x\ } X 2 , 

. . . , x ny are, then, the coordinates of the sample point P in multiple 
dimensional space. The dimensionality of the sample point P is the 
number of observations, n, in the sample. There are n degrees of freedom. 
However, if a restriction be placed on the sample point, the number of 
degrees of freedom is decreased by 1; that is, its dimensionality is reduced 
by 1 and thus becomes n — 1. Correspondingly, each additional restric¬ 
tion or section through sample space carries with it an additional reduc¬ 
tion in the dimensionality or number of coordinates. Thus to restrict the, 
point in three-dimension space to a surface, one condition is imposed on 
its coordinates. To restrict a point in space of three dimensions to a 
curve, it is necessary to subject its coordinates to two independent con¬ 
ditions (Ref. 31). 

An illustration of the reduction of dimensionality is given by consider¬ 
ing two planes whose equations are 

(1) 2x — y + 3z — 4 = 0 

(2) 2x — y + 5z + 3 = 0 

In (1) only two of the values are independent; given x = 5 and y = 12, 
the value of z is fixed as 2. Likewise in (2), given any two values for x 
and y t z is determined. In each case there are 2 degrees of freedom. If 
restrictions are imposed such that points which lie on both planes are 
to be determined, then they must lie on the line of intersection of the two 
planes. These points are determined by solving the equations for x and z 

in terms of y, or for y and z in terms of x . Thus: x = z = \ 

Now, there is only one independent observation. That is, by selecting 
values for y and calculating the corresponding values of x and z from 
these equations, any desired number of points on the line are obtainable. 
Since there is only one independent variable, the number of degrees of 
freedom is 1—the point can move up and down the line of intersection. 
The dimensionality has thus been reduced to 1. 

Statistical Interpretation. In its statistical application, the number 
of degrees of freedom is the number of free variables in the problem or in 
the distribution of the random variables connected with it. For each 
restriction imposed upon the original observations, such as in the estima¬ 
tion of a population value from the sample, the number of degrees of 
freedom is reduced by 1. 

It has been previously noted that the unbiased estimate of the popula- 
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tion variance from a sample, n, is obtained by dividing the sum of squares 
of deviations of the individual observations from their mean by n — 1, 
which is the number of degrees of freedom. In this case, it is observed 
that this is the number of deviations reduced by the number of parameters 
estimated from the sample and used to establish the point from which the 
deviations are measured. In this case, the mean is found from the 
sample, and hence the number of degrees of freedom is one less than the 
number of observations. 

In the case of establishing a regression line among a distribution of 
observed values, the straight line will fit any two observations with no 
residuals. Thus, in fitting the least-square line to 25 observations there 
are 23 degrees of freedom. Two degrees of freedom have been used up in 
estimating the two parameters in the regression equation (see page 88). 

The principle that for each relationship imposed upon the original 
observations there is a corresponding reduction in the number of degrees 
of freedom originally available will be found to apply throughout statis¬ 
tical procedures. 6 


Problems 

1. Show that the maximum likelihood estimate of the population reli¬ 
ability coefficient, p, for the case of the split-test method is 

(2X< + 2 F») 2 


2 XX { Y, - 


2N 


XX\ + XY\ 


(XX, + XY,Y 
2 N 


when Xi and Y, denote the scores obtained by the zth individual on 
the odd and even items of the test, respectively; N, the number of 
pairs of values; and p, the correlation coefficient in the sampled popu¬ 
lation of X and F. 

2. Set up the confidence interval with a confidence coefficient of 95 per 
cent for p, the population correlation coefficient. The sample value 
is r = .77, the correlation between scores on Miller’s Analogies Test 
and the Otis Intelligence Test for a random sampling of 50 graduate 
students. Any of the following may be used: The exact tables of the 
r-distribution (David, F. N., Tables of the Correlation Coefficient 
Biometrika Office, London, 1938); the transformation of r suggested 
by Pillai (Pillai, K. C. S., Sankhya t Vol. 7, Part 4, pp. 415-422, July, 
1946); or the logarithmic transformation due to R. A. Fisher. 

The probability for the inequality, where a is determined from the 
normal scale corresponding to a given confidence coefficient a (say 
0.95), is 


6 For equivalence of degrees of freedom to orthogonal linear functions, see the 
discussion of Analysis of Variance, Chapter X. 
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-«< tvsr^sva] lot [ g+;jg + ;j ] <« 

3. Given: Y s = .6570X + 33.76 

as the equation for estimating the score on a mid-quarter exami¬ 
nation from a knowledge of the score on Miller’s Analogies Test. 

Sum of squared x-deviations = 10,584.88 
Sum of squared ^-deviations = 9788.50 

n = 50 

Set up the confidence interval for b yx with a confidence coefficient of 
99 per cent. Let 0 Y x be the population value. 

9 s 

has Student's distribution with n — 2 degrees of freedom. 

- 5^2 s < r - F ')’ 

4. With the aid of Nair’s tables (Ref. 22) find the 95 per cent and 99 per 
cent confidence intervals from the following values of the median: 

(a) Median = 38, N = 25 (c) Median = 42, N = 229 

(b) Median = 18, N = 25 (d) Median = 21, N = 219 

5. Set up the 99 per cent confidence interval for the difference between 
the percentages given below obtained in two public-opinion polls: 

«i = 3000, pi = .52; n 2 = 800, p 2 = .48 

6. Set up the 98 per cent confidence interval for the difference in per¬ 
centages obtained on the same sample: 

68 per cent answered “yes” n = 500 

32 per cent answered “no” 

7. Plan in advance the size of sample necessary to provide from the 
sample an estimate of P so that the confidence belt will be of breadth 
about .05. Take a confidence coefficient of .95. The value of P 
from the sample is .60. [See also: Finney, D. J., “Errors of Esti¬ 
mation in Inverse Sampling,” Nature, Vol. 160 (1947), pp. 195-6.] 

8. Set up the fiducial limits of the true mean difference based on the 
data from the controlled experiment given in Problem 2, page 98. 
Use a fiducial probability of 95. 

9. Set up the fiducial limits of the variance of the distribution of differ¬ 
ences based on the data in Problem 2, page 98. Use a fiducial 
probability of 90. 
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10. On a particular intelligence test a pupil received an I.Q. rating of 98. 
On this test the standard error of an individual score is 4.51 I.Q. 
points. Set up the confidence interval for the true score of the 
pupil, using a confidence coefficient of 95 per cent. 

11. Given: Y E = .6570X + 33.76 

which is the equation for predicting Y E , the score on a mid-quarter 
examination from a knowledge of a score, X , on Miller's Analogies 
Test. 

r = .683, si = 216.018, s* = 199.765, n = 50, X = 69.32 

Determine the confidence interval (99 per cent) for mid-quarter 
score for the following scores on Miller Analogies: 

(a) 99; (b) 69; (c) 27. 

(d) Explain your answer to (a) above. 

12. The following table (based on the 1940 Census) gives the percentage 
of adults over twenty-five years of age by states who had not com¬ 
pleted more than four years of school: 


State 

Percent¬ 

age 

State 

Percent¬ 

age 

State 

Percent¬ 

age 

Iowa. 

4.1 

Dist. of Columbia 

8.2 

Rhode Island. . . 

. 13.7 

Oregon. 

. 5.2 

Ohio. 

8.4 

Maryland. 

. 15.3 

Idaho. 

. 5.2 

Nevada. 

8.8 

West Virginia... 

. 16.5 

Utah. 

. 5.5 

Colorado. 

9.0 

Florida. 

. 18.5 

Washington. 

. 5.9 

Wisconsin. 

9.4 

Texas. 

. 18.8 

Nebraska. 

HIEkH 

Illinois. 

9.6 

Arizona. 

. 19.4 

Kansas. 

6.1 

Massachusetts . .. 

10.1 

Kentucky. 

Tennessee. 

. 20.2 

Vermont. 

6.1 

Michigan. 

10.2 

. 21.7 

Wyoming. 

7.1 

Missouri. 

10.3 

Arkansas. 

. 23.1 

South Dakota . .. 

7.2 

North Dakota. . . 

10.8 

Virginia. 

. 23.2 

Montana. 

7.4 

Connecticut. 

11.2 

North Carolina. 

. 26.2 

Maine. 

7.4 

New Jersey. 

12.0 

New Mexico 

. 27.3 

Minnesota. 

7.5 

New York. 

12.1 

Alabama. 

. 28.9 

Indiana. 

7.7 

Pennsylvania 

12.3 

Georgia. 

. 30.1 

California. 

8.1 

Delaware. 

12.9 

Mississippi. 

. 30.2 

New Hampshire. 

8.1 

Oklahoma. 

13.5 

South Carolina.. 
Louisiana. 

. 34.7 

. 35.7 


Problem: Set up the tolerance limits for years of schooling of adults 
(take a = 90 per cent). How may the results be used in analyzing 
a state's educational program? 

13. Students of fiscal policies are invited to study the characteristics and 
use of grant-in-aid apportionment formulas in relation to setting up 
tolerance limits. (Cornell, Francis G., “ Grant-in-aid Apportion¬ 
ment Formulas," Journal of American Statistical Association , Vol. 42 
(1947), pp. 92-104.) 
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14. The following tabular data are to be used for the problems below: 

Scores of 25 Freshman Students on Test Forms 
A and B of a Science Reading Test 


Student 

No. 

Score on 
Form A 

Score on 
Form B 

1 

18 

21 

2 

33 

37 

3 

38 

44 

4 

29 

30 

5 

64 

63 

6 

74 

68 

7 

33 

36 

8 

72 

66 

9 

58 

51 

10 

56 

57 

11 

28 

39 

12 

71 

76 

13 

53 

53 

14 

39 

40 

15 

37 

42 

16 

29 

27 

17 

58 

68 

18 

20 

26 

19 

65 

71 

20 

28 

31 

21 

16 

23 

22 

50 

44 

23 

29 

32 

24 

46 

54 

25 

36 

35 


Problems: 

(a) Test the equivalence of the forms A and B of the reading test by 

(1) Testing the equality of the standard deviations of the scores 
on the two forms. 

(2) Testing the equality of means, variances, and covariances of 
the scores on the two forms. 

(b) Determine the reliability of the reading test by calculating the 
product-moment correlation coefficient. 

(c) Determine the reliability of the reading test by getting the maxi¬ 
mum likelihood estimate. 

(d) * Determine the sensitivity of the reading test. 

(e) Calculate the standard error of measurement of an individual 
score. 

* This may be postponed until the analysis of variance method has been studied. 
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15. The following data are to be used in the problems below: 

Scores op a Random Sample of 25 Students on 
a Comprehensive Examination in 
College Biology 

Student Score on items 
No. Odd Even 


1 

143 

145 

2 

175 

179 

3 

158 

157 

4 

178 

172 

5 

113 

94 

6 

143 

140 

7 

136 

139 

8 

234 

243 

9 

201 

207 

10 

203 

213 

11 

222 

248 

12 

200 

184 

13 

195 

191 

14 

126 

136 

15 

186 

208 

16 

163 

186 

17 

160 

158 

18 

188 

197 

19 

196 

206 

20 

253 

249 

21 

206 

196 

22 

167 

154 

23 

148 

142 

24 

188 

186 

25 

204 

221 


Problems: 

(a) Before attempting the methods of determining the reliability of 
the biology test, test the assumption regarding the means and 
standard deviations on the two halves of the test. 

(b) If the assumptions in (a) are fulfilled, determine the reliability 
of each half of the test by calculating the product-moment corre¬ 
lation coefficient. 

(1) What are the assumptions underlying the use of the Spear¬ 
man-Brown formula? 

(2) If the assumptions in (1) are fulfilled, estimate the reliability 
coefficient of the whole test. 

(c) If the assumptions in (a) are fulfilled, calculate the reliability 
coefficient of the test by getting the maximum likelihood estimate. 

(d) Calculate the standard error of measurement of an individual 


score. 
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16. Calculate the reliability coefficient for the English examination by 
using the method of rational equivalence. The examination of 297 
items was administered to a group of 209 college students. The 
following values were computed from the examination results: 

X = 144.58 s? = 775.0656 

8 t = 27.84 pq = .245 

n = 297 

17. Calculate the reliability coefficient for a mathematics test of 75 items 
administered to 35 students by using the analysis of variance method 
(Hoyt). The basic data are given in Tables A and B. 


TABLE A 

Number or Correct Responses to Each of the 75 Test Items 


Item / 

Item / 

Item / 

Item / 

Item / 

Item / 

Item / 

Item 

f 

1 

22 

11 

11 

21 

18 

31 

7 

41 

24 

51 

9 

61 

16 

m 

20 

2 

25 

12 

25 

22 

22 

32 

9 

42 

20 

52 

8 

62 

16 


16 

3 

25 

13 

11 

23 

23 

33 

24 

43 

26 

53 

18 

63 

1 


23 

4 

24 

14 

9 

24 

17 

34 

19 

44 

9 

54 

14 

64 

8 


14 

5 

8 

15 

17 

25 

17 

35 

26 

45 

16 

55 

10 

65 

20 


2 

6 

22 

16 

27 

26 

9 

36 

17 

46 

15 


6 

66 

19 



7 

27 

17 

13 

27 

14 

37 

15 

47 

4 

57 

9 

67 

11 



8 

11 

18 

19 

28 

14 

38 

13 

48 

31 

58 

11 

68 

9 



9 

23 

19 

23 

29 

16 

39 

22 

49 

24 

59 

7 

69 

14 



10 

16 

20 

25 

30 

! 

25 

40 

18 

50 

22 


26 

70 

19 




TABLE B 

Total Scores op the 35 Students 


Score 

/ 

Score 

/ 

Score 

/ 

Score 

/ 

Score 

/ 

55 

1 

45 

1 

36 

1 


1 

25 

2 

54 

1 

44 

2 

35 

3 

29 

1 

24 

1 

52 

2 

43 

1 

34 

1 

28 

2 

23 

1 

50 

1 

42 

1 

33 

1 

27 

1 

17 

1 

48 

1 

41 

1 

31 

1 

26 

3 

16 

1 

47 

1 

39 

1 








18. (a) Look up in some reference text or texts (Kelley, Truman L., 
Fundamentals of Statistics, for instance) the following methods 
of estimating correlation: 

(1) Biserial r 

(2) Point-biserial r 

(3) Biserial phi-coefficient 

(4) Correlation for a fourfold point-surface, or the phi-coefficient 

(5) Tetrachoric r 
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(6) Coefficient of mean square contingency 

(7) Correlation ratio 

(b) Specify the types of problems for which each method in (a) is 
designed. 

(c) What assumptions underlie the use of each method? 

(1) How may these assumptions be tested? 

(d) Which of the approximate measures of relationship are converti¬ 
ble to the product-moment scale, and under what conditions? 

19. Evaluate the several statistics that are in use as indices of internal 
consistency in item analysis. 

20. Plan in advance, from the data in Problem 9, Chapter 5, page 100, 
the size of sample such that the probability will be .95 that the .99 
confidence interval of the mean will have a length less than four¬ 
score units. 
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CHAPTER VII 


NORMAL AND NORMALIZED DISTRIBUTIONS IN STATISTICS 

The assumption that measurements are distributed in normal prob¬ 
ability curves underlies much of statistical theory. The mathematical 
conditions for normality have been determined (Ref. 8). The best 
evidence of the fulfillment of these conditions in any particular case is 
that which is available in the observations . Sometimes, then, it is sig¬ 
nificant to show that observations are normally distributed or at least that 
the available evidence indicates a high probability of such a distribution. 

The Test of the Hypothesis of Normality. Standard statistical 
methods are available for testing the hypothesis of normality The chi- 
square test of the goodness of fit of theoretical normal frequencies to 
observed frequencies is a general test of the normality of a distribution of 
measurements. The test based upon the criteria of Pearson is first pre¬ 
sented. Two criteria provide the basis of estimating the extent of agree¬ 
ment between an observed distribution and the normal distribution with 
respect to two characteristics, symmetry and k urtos is. 

The criterion for symmetry is \ / % L = The criterion for 

kurtosis is 0 2 = For the normal curve, v^i = 0, and 0 2 = 3. 

It is observed that these criteria involve a second, third, and fourth 
moment. They are not affected by the size of the unit of measurement 
employed and are measures of the shape of the unimodal frequency 
distribution. The measurement of the form of variation of the distribu¬ 
tion is given in terms of symmetry and kurtosis, or the flatness of the 
mode. 

Pearson’s Test of Normality. The steps in the process of fitting the 
normal curve to a series of observations by the method of moments are 
described in detail below. 

1. Calculate the first four moment coefficients. 

(a) Moments about the mean and origin of ungrouped data. If X 
is the variate, measured from the origin; X is the arithmetic mean; and 
N is the size of the sample; then the sth moment coefficient, /*„ about the 
mean is 

m. = h X (X ~ (701) 

In practice, usually with machine calculation, it is convenient to cal¬ 
culate first the powers of the observed values of X measured from the 

149 
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origin. Then the sth moment coefficient, p'„ about the origin is 


m( = 


2X* 

N 


(7.02) 


Then the first four moment coefficients about the mean can be found 
from those about the origin from the following equations: 


Mi = 0 

M2 = M 2 ' - (mD 2 

MS = Mg' - 3 mi'm2 + 2(m0 3 

M4 = Mi — 4 /i{m( + 6(h0 2 M2 — 3(m0 4 . 


(7.03) 


These equations may be obtained by expanding the binomial, (X — X)", 
and finding the mean for each term of the expansion, separately. 

(b) Moments from grouped data. 

When the original observations are first grouped into a frequency 
distribution, it is assumed that all values in a class interval have the value 
of its central point. Thus if n t is the number of observational values 
in the fth class interval and X t is its central value, then the sth moment 
coefficient, say V'„ is given by 

V ‘ = 7 fX ntXl (7 * 04) 


The moment coefficients V' for group data should then be reduced to the 
values 7, about the mean by means of equations as follows: 


= 0 

V* = Vi - ( 70 * 

V z =Vi- 37(7,' + 2(70* 

7 4 = 7J - 47J7J + 6(7()*7( - 3(7() 4 


(7.05) 


2. Calculate the adjustments for grouping errors. 

The assumption in grouped data is that the observations take the 
value of the mid-point of the class interval. This assumption can be 
more nearly fulfilled if corrections for grouping, known as Sheppard’s 
corrections, are applied. No corrections are necessary in the first and 
third moment, since the effects of grouping tend to balance out. They 
are made in the second and fourth moments when the statistics are a 
system of areas and the height of the curve tapers off gradually at both 
tails. These corrections serve then to give a better estimate of the 
parameter values. The sth moment coefficients, p„ with Sheppard’s 
corrections, are 


Mi = 7 1 

M 2 = 7* — i^(A 2 ) (h = length of interval) 

Ms = 7s 

M4 = 74 - *7*(fc*) + MV) 


(7.06) 
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3. Calculate 0 i and 02 . 

* = §■> 0 * = g (7-07) 

M2 M2 

If normal: \/fa = 0; fa = 3 (7.08) 

4. Test whether the obtained values of y/fa and fa differ significantly 
from 0 and 3. 

The exact sampling distributions of \/ r fa and 0 2 when the population 
is normal have not been worked out, but E. S. Pearson (Ref. 18) has 
determined approximate empirical frequency curves from the moments 
of the sampling distributions. Tables giving values of \ /r fa and fa are 
available by which it can be determined according to size of sample how 
much deviation may be expected from 0 and 3 due to random sampling 
errors alone. 

If either one or both of the criteria, \/fa and fa, differ significantly 
from the values for the normal curve, 0 and 3 respectively, the hypothesis 


TABLE 39 

The Computation of the First Four Moments for Use in Determining Pear¬ 
son's Criteria of Normality 


Group 

interval 

/ 

X - 144.5 
*“ 10 

fx 

fx 2 

fx 9 

fx 4 

a) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

229.5-239.5 

3 

9 

27 

243 

2,187 

19,683 

219.5-229.5 

14 

8 

112 

896 

7,168 

10,633 

57,344 

209.5-219.5 

31 

7 

217 

1,519 

74,431 

199.5-209.5 

50 

6 

300 

1,800 

10,800 

64,800 

189.5-199.5 

56 

5 

280 

1,400 

7,000 

35,000 

19,968 

179.5-189.5 

78 

4 

312 

1,248 

675 

4,992 

169.5-179.5 

75 

3 

225 

2,025 

6,075 

159.5-169.5 

81 

2 

162 

324 

648 

1,296 

149.5-159.5 

81 

1 

81 

81 

81 

81 

139.5-149.5 

81 

0 

0 

0 

0 

0 

129.5-139.5 

77 

- 1 

- 77 

77 

- 77 

77 

119.5-129.5 

53 

- 2 

-106 

212 

- m 

848 

109.5-119.5 

46 

- s, 

-188 

414 

-1,242 

3,726 

99.5-109.5 

31 

- 4 

-124 

496 

-U984 

7,936 

89.5- 99.5 

22 

- 5 

-110 

550 

-2,750 
- 4,104 

13,750 

79.5- 89.5 

19 

- 6 

-114 

684 

24,624 

36,015 

69.5- 79.5 

15 

- 7 

-105 

735 

-5,145 

59.5- 69.5 

0 

- 8 

0 

0 

0 

0 

49.5- 59.5 

4 

- 9 

- 86 

324 

-2,916 

26,244 

39.5- 49.5 

1 

-10 

- 10 

100 

-1,000 


29.5- 39.6 

1 

-11 

- 11 

121 

-1,881 


Total 

N - 819 


885 

S/x 

11,899 

25/x* 

24,561 

S/x* 

416,539 

S/x 4 
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that the sample could be a random sample from a normal population is 
rejected. 

Problem VII.1. Testing the normality of a sample by Pearson’s 
method. The fitting of the normal curve to a set of observations is 
carried out on a set of achievement-test scores of 819 students on a final 
examination in a college course in general zoology. The arithmetical 
labor is substantially reduced over that of following directly the process 
specified in Equation (7.04) by taking the origin near the center of the 
distribution and proceeding to work with the class interval as the unit. 
This is done by calculating the moments about the origin of the computa¬ 
tion variable, x. The corrections indicated in Equation (7.06) can then 
be made, putting h = 1. The whole process is followed out as recorded 
in Table 39. 

We shall follow through the calculations in the order in which they 
have been presented in the preceding theoretical discussion. The mean 
and standard deviation of the distribution are as follows: 


x = Iff = 1.08059 
X = 144.5 + 10.8059 = 155.3059 

/s/x 2 -,Y Al,899 , 1 AQncms V 2996.0227 occo1 , 

- \1T " X ) = V8W ~ (L08059) J = ~ 819 ~ = 365814 
s x = 36.5814 

Step 1. Calculate moments about the origin of the computation 
variable: 

7( = 1.080586 
(7() 2 = 1.1676668 
(7() 3 = 1.26176386 
(7() 4 = 1.36344459 
Step 2. Calculate moments 
7i = 0 

7j = 7( - (7() 2 = 14.52869 - 1.16767 = 13.36102 
7, = 7' — 37(75 + 2(7() 3 

= 29.98901 - 3(1.080586)(14.52869) + 2(1.26176386) 

= 29.98901 - 47.098497 + 2.52352772 = -14.585959 
7i = 7J — 47(75 + 6(7() 2 7( - 3(7(X 

= 508.5946 - 4(1.080586) (29.98901) + 6(1.1676668)(14.52869) 

- 3(1.36344459) 

- 476.6694 

Step 3. Correct the moments for grouping by Sheppard's corrections 
(for computation variable x, we have h = 1): 


V' = = 14.52869 

75 = = 29.98901 

7( = 41 | ’q 39 = 508.5946 
about x: 
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Mi ~ Ui = 0 

= Jsh* = 13.36102 - .08333 = 13.27769 
Ms = Vt = -14.585959 

iu = V t - tVih* + = 476.6694 - 6.68051 + .029167 

= 470.01806 

Step 4. Calculate /?i and /3 2 or ai and a 2 : 


_ Ms _ (-14.585959) 2 
Pl m! (13.27769)’ 


= .09088713 


VWi = = -.3014 

M2* 


We refer to the tables of \/0l (Ref. 18) and find that this deviation, 
— .3014, or one greater than this from = 0 or = 0 for the normal 
curve, is to be expected less than once in 100 trials by random sampling 
from a normal distribution or population. Thus, the distribution under 
consideration deviates significantly from a normal distribution with 
respect to \/0i- 


02 = 


M4 

A 


470.01806 

(13.27769) 2 


= 2.666 


We refer to the tables of 0 2 (Ref. 18) and find that the observed value 
of 02 or one less than this value is to be expected less than 5 times in 
100 trials but more than 1 time in 100 trials in random sampling from a 
normal population. Thus, the present distribution deviates significantly 
at the 5 per cent level from a normal population with respect to 0 2 . 

Fitting the Normal Curve to a Set of Observations by the Use of 
Cumulants. In 1928, R. A. Fisher developed a new kind of symmetric 
function, the fc-statistics, which possess the valuable property of giving 
particularly simple sampling formulas, obtainable directly by combina¬ 
torial methods, and removing most of the algebraic labor characteristic 
of the older methods. The fc-statistics, fc 7 ,(p = 1,2, • • •), are symmetric 
in the observations, Xi, . . . , X n , so that the mean value of k p is the 
pth cumulant, or E(Jk p ) = k p . 

Fishers criteria to test for the departure from normality of an 
observed sample, known as the statistics gi and g 2 , are calculated from the 
fc-statistics, fci, fc 2 , fc 3 , and fc 4 , which are in turn derived from the sums of 
powers, from the second through the fourth, of the deviations from the 
mean. The quantity gi is essentially a measure of asymmetry or skew¬ 
ness. The parameter y of which gi is an estimate is related to ± \/0i 
of Pearson's notation as follows: 
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The quantity gt is a measure of the peakedness or flatness of the curve, 
that is, its kurtosis. The parameter 72 of which gt is an estimate is 
related to Pearson’s /3* in the following way: 


7* = Pl — 3 = ^ — 3 = l 4 

M 2 *2 


(7.10) 


A convenient way of calculating the ^-statistics is to get first a series 
of values V\, Vt, P», F«, defined as follows: 


V 1 = 


2X 

N 


= T - ** 

7, = 1*1 - ZXVt - X s 
N 

y 4 = - 4ZFa - 6X 2 7 2 - A 4 

AT 


(7.11) 


The ^-statistics are then given by 

fci = Vi 
NV 2 


fc 2 —— 
k 3 = 
k< = 


iV - 1 
APF S 

(AT - 1)(JV - 2) 
N\N + 1) 


V, - 


3N 2 


V\ 


(7.12) 


(AT - 1)(N - 2)(iV - 3) 4 (N — 2 )(N - 3) 2 _ 

If the sums of powers are calculated from group data, Sheppard’s 
corrections for grouping may be applied as follows: 

b> — jt.„ 1 . if' — if, _ 1 

1*2 — 1*2 T2 y rt/ 4 — M TTSIF 

However, these corrections should be used for purposes of estimation, not 
for testing significance. 

The statistics gi and gt are given by 


k 3 

ei== K* 
k t 


gt = 


kl 


(7.13) 

(7.14) 


For samples from a normal population, gi is distributed normally 
about 0 with a sampling variance 

6 N(N - 1) 


(N - 2)(AT + 1 )(JV + 3) 


(7.15) 


Similarly, gt is distributed normally about 0 with a sampling variance 

s _ 24N(N - l) 2 

## * (AT - 3)(AT - 2)(AT + 3)(2V + 5) 


(7.16) 
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Unless the divergence is marked, large samples are required to detect 
departure from normality, because the exact sampling distributions 
of the criteria are not known. 

Problem VII.2. Testing the normality of a sample by Fisher’s 
method. An example of the method of testing normality by means of 
the ^-criteria is given by applying it to a sample of the honor-point ratios 
(H.P.R.) of 302 freshmen in the University of Minnesota College of 
Agriculture. The calculations are set out in Table 40. 

We find that t Q = * = -166 and L = g-f? - = —1.97. Entering 

the normal table or the stable with degrees of freedom = co, we find 
that the respective probabilities are .87 and < .05. Therefore we may 
conclude that the hypothesis of normality is rejected at the 5 per cent 
level. 

Special Treatment of Data to Secure Normal Distributions. Two 

alternatives are open to the research worker if he finds that his data do 
not conform to a certain model about which considerable is known and 
by the use of which the analysis is relatively easy to work out. He may 
develop a new model to which his data may conform, or he may transform 
his data to make them fit one of the conventional models. The first 
alternative is often a problem of considerable mathematical difficulty. 
Hence the second procedure is usually followed. 

In particular, the large part of statistical theory is built on the assump¬ 
tions that the observations are distributed normally and that the variance 
is constant. It is often important, therefore, for the research worker to 
show that his measurements are distributed normally or to transform 
them into a form that is normally distributed, or at least into a form 
that has the best possible chance of being so distributed. In some cases 
the normal probability curve gives a very close approximation to the 
observed facts. Although this is not often the case, it is usually possible 
to transform the original observations into some function of them so 
that the function will be distributed normally. In this way the processes 
in subsequent calculations become simplified and the results more com¬ 
prehensive in application. For instance, if the mean and standard 
deviation of the normal distribution are known, the distribution is known 
exactly. If any obtained distribution of observations is established as 
normal, then the known properties of the normal model may be applied 
to it. Tests of significance become more valid and sensitive when the 
sampling distribution is normalized in case of original skewness. 

The linear scale seems to be used in taking observations almost auto¬ 
matically, as if it were the one unique scale used in nature. This scale 
may often be the most convenient way of representing the original 
observations, but it need not be for that reason the only way. Should 
measurements made in one way follow the normal law, other methods 



TABLE 40 

Calculations in Testing the Normality of a Distribution by the Use of the ^-Statistics 
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would not be likely to lead to a similar distribution. For example* 
measurements of the volume of an object might be found to follow a 
normal distribution whereas measurements of the diameter would not. 
Here the measurement of the volume would be the more convenient to 
deal with. Since the method of measurement giving a normal distribu¬ 
tion, if it exists, is not known a priori, it is not likely that the appropriate 
method will be selected to begin with. 

The second condition that is often indicated or implied as a necessary 
condition for the unfettered use of statistics is the stability or at least the 
predictability of the variance. Methods of measurement or of trans¬ 
formations giving normal distributions are of special significance when 
the standard deviation is large in comparison with the mean. In cases 
where the standard deviation is small, the effect of any transformation 
is less and, when it is very small, negligible. Both a necessary and 
sufficient condition for the independence of the mean and standard 
deviation in samples is normality in the parent distribution. 

We now consider the nature and use of various transformations 
designed to normalize or stabilize variates so as to render their distribu¬ 
tions more amenable to treatment by statistical methods based on these 
conditions. 

T-Score. In the field of educational psychology, McCall (Ref. 16) 
converted the raw scores on a mental test of an unselected group of twelve- 
year-old children to T-scores. This transformation gives a normal 
distribution of T-scores. The process is illustrated in the transformation 
of the raw scores of 141 freshmen on a science test (Table 41). 

In columns (1) and (2) the raw-score frequency distribution is given. 
Column (3) gives the cumulative frequency up to the mid-point of the 
respective raw-score units; for example, in row 1, N = 133 + £(8) = 137. 
In column (4) the cumulative percentages are listed; for example, in 
row 1, 137/# = 137/141 = 97.13. 

The values recorded in column (5) were obtained from the table of 
areas and abscissas of the normal curve (Table I, Appendix). Thus, in 
row 1 the abscissa value of a point, such that 97.13 per cent of the total 
area under the normal curve lies below the ordinate erected at that point, 
is found from the table to be 1.90. 

The T -score values in column (6) are obtained by multiplying each 
abscissa value by 10 and adding 50 to the product. Thus, in row 1, 
10(1.90) + 50 = 69. 

The ^-score unit is defined as one-tenth of the standard deviation. 
The mean of the distribution of T-scores is 50 and the standard deviation 
is 10. 

It is to be noted that measurements of the mental qualities of indi¬ 
viduals may be made so that their distribution will be normal within the 
limits of sampling error. This result can be obtained for a large unse- 
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TABLE 41 

Transformation of Raw Scores on Johnson Science Application Test of 141 
Freshman Students to T-Scores 


Raw score 

/ 

Scores lo^ 
those at gi 

N 

iver + i 
ven score 

Per Cent 

Values of abscissa 
in standard 
measure 

T -score 

a) 

(2) 

(3) 

(4) 

(5) 

(6) 

50 

8 

137.0 

97.13 

1.90 

69 

49 

8 


91.46 

1.37 

64 

48 

7 

121.5 

86.14 

1.09 

61 

47 

8 

114.0 

80.82 

0.87 

59 

46 

8 

106.0 

75.15 

0.68 

57 

45 

7 

98.5 

69.83 

0.52 

55 

44 

6 

92.0 

65.23 

0.39 

54 

43 

5 

86.5 

61.32 

0.29 

53 

42 

7 

80.5 

57.07 

0.18 

52 

41 

6 

74.0 

52.47 

0.06 

51 

40 

6 

68.0 

48.21 

- 0.04 

50 

39 

5 

62.5 

44.31 

- 0.14 

49 

38 

9 

55.5 

39.35 

-0.27 

47 

37 

7 

47.5 

33.68 

-0.43 

46 

36 

7 

40.5 

28.71 

-0.59 

44 

35 

7 

33.5 

23.75 

-0.71 

43 

34 

7 

26.5 

18.79 

-0.89 

41 

33 

5 

20.5 

14.53 

-1.06 

39 

32 

5 

15.5 

10.99 

-1.23 

38 

31 

6 

10.0 

7.09 

-1.47 

35 

30 

4 

5.0 

3.55 

-1.81 

32 

29 

3 

1.5 

1.06 

-2.30 

27 


lected homogeneous group of individuals usually by constructing a test 
or examination comprised of some very easy items, some very difficult 
items, and many items of average or intermediate difficulty. Of course, 
a test can be constructed to conform within limits to whatever shape of 
distribution is wanted by varying the difficulty of the test, the time 
allotment for administering the test, the system of weighting the scoring 
of items, the choice of the unit of measurement, and so forth. Further¬ 
more, even if the examinations yield results that are normal for a homo¬ 
geneous population, the same examination administered to a special 
group will likely give scores that are skewed, often as a consequence of 
selection or of the inappropriateness of the examination to the group 
tested. Whether a normal or some other type of distribution results 
from the measurements used, it is obvious that whatever knowledge is 
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gained about the distribution, it concerns the distribution of the function 
of the trait used in the measuring process. This conclusion is valid 
because the measurement is indirect, that is, through the measurement 
of a functional relationship, the exact nature of which is unknown. Our 
measurements are only the manifestation of the underlying trait. The 
statement that the mental traits of man are or are not normally dis¬ 
tributed is unproved and unprovable. No amount of experimentation, 
for instance, could demonstrate that intelligence is normally distributed. 
Our knowledge of its distribution relates to the way in which the mathe¬ 
matical function we use in measuring intelligence is distributed. The 
frequency distribution of Binet I.Q’s, for example, for a large homo¬ 
geneous population is generally held to be normally distributed. How¬ 
ever, even here the extreme lower end of the distribution of I.Q.'s is not 
normal, since there is an excess of individuals with low I.Q/s (see Ref. 19, 
page 102). Thus he who makes a test proceeds by first assuming that the * 
trait is normally distributed and then by deriving measurements which 
will conform to this model. When the raw scores for a particular sample 
are found to be skew, one means of normalizing them is to convert them 
to T-scores. 

Only when a trait is measurable directly can the true nature of the 
distribution of the trait become known. Certain biometrical measure¬ 
ments made on random samplings from homogeneous populations may 
be normal. Wechsler (Ref. 21) collected available data for 89 measured 
traits and abilities of human beings. Certain linear measurements, such 
as stature, length of extremities, the various diameters of the skull, and 
certain of their ratios like the cephalic index, were the only distributions 
which might be regarded as normal, although even among these there 
was often considerable asymmetry. 

The Use of Probits in Testing the Normality of Transformations . 
The best method of transformation to secure normalization must usually 
be determined by trial and error. The success of any particular method 
can be determined by the application of the standard statistical methods 
previously described. However, a simple graphical method is available 
which can be used to find out which transformations are successful and 
in what respects other transformations are not. The method was 
developed for dealing with toxicological and other dosage-mortality data, 
particularly by Gaddum (Ref. 14) and Bliss (Refs. 3 and 4). Their 
method, that of probits, is first presented in its use for testing the normal¬ 
ity of transformations. 

The probit is defined in terms of the normal equivalent deviation 
(N.E.D.), and is readily determined for any given percentage from the 
unit normal curve. The N.E.D. of a given percentage is the deviation 
(from the mean) equivalent to the given percentage of the area of the 
curve. In order to make all values positive, the probit is the value result- 
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ing from adding 5 to the normal equivalent deviation . 1 The probit 
values corresponding to given percentages can be read directly from 
Fisher and Yates's Table IX (Ref. 13). The graphical method consists 
in plotting the appropriate transformations of the observations as 
abscissas either on probability paper or against the corresponding probits 
as ordinates. If the individuals or experimental subjects vary in such 
a way that the measurements or transformed measurements of the experi¬ 
mental factor are normally distributed, the probit should be a linear 
function of the measurement or of its transformation. It is usually 
immediately apparent whether or not the plotted points are randomly 
distributed about a straight line. When they are so distributed, one can 
with practice draw a straight line among the points to fit satisfactorily 
for most practical purposes. 

It is possible to fit regression lines, and maximum likelihood estimates 
of the population parameter values of the mean and standard deviation 
can be obtained when more precise methods are needed. A straight-line 
probit graph fitted by eye provides the first approximation. Although 
graphical analysis is probably the most efficient method for selecting a 
suitable function, sometimes it is necessary to determine by computation 
whether a given transformation is effective or, alternatively, whether the 
departures from another mode of plotting deviate significantly from 
normality. The standard statistical tests for this purpose, the statistics 
and 02 , have been discussed previously. The first, 0 i, measures the 
skewness of the presumed normal distribution and determines whether 
or not the chief trend of the points is truly linear; 0 2 indicates whether 
the secondary trends and twists about the straight line are statistically 
significant. With a small number of observations, only large departures 
from a straight line will be statistically significant. This conclusion will 
have been recognized as obvious during graphic analysis, so that the 
computation may then be seldom worth doing. However, when a 
number is sufficient for making grouping advisable (say 50 or more), the 
calculations leading to the testing of the agreement between observations 
and hypothesis may lead to results that are not apparent from inspection. 

The principal use of the graphical method just described is limited 
in its application to data to the percentages corresponding to the values 
of the variable. However, the graphical method is at times useful even 
when more complete information is available (Ref. 14). For example, if 
there are N observations of a given variable, one method is to rank them 
according to size. Then the smallest observation is assigned a percentage 

of and to succeeding observations, percentages of . . . , 

~ These percentages are then changed to probits and each 


1 Compare with the 7 T -score, page 158. 
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individual observation is plotted. When the data become sufficient, 
they are grouped and added cumulatively and the probits are then 
plotted against the points separating the groups. When the number of 
cases in a group is very small, it is preferable to plot the individual read¬ 
ings or to assume an even distribution of the observations over the range 
covered by the group. 

Again, if straight lines fit the data, the distributions are normal. The 
mean and standard deviation can be estimated fairly accurately from 
the graph. The reciprocal of the slope of the line gives the estimate 
of the standard deviation. The mean is the value of the abscissa for 
which the probit value (as ordinate) is 5. The customary technique for 
calculating a regression line is not appropriate when the experimental 
results are of the kind just described. The best estimates of the mean 
and standard deviation are obtained by using the ordinary methods 
directly on the transformed observations. When the original observations ■ 
are grouped, the most convenient method may be to estimate these 
statistics from the moments of the distribution. 

The method of probits also provides a general graphical method of 
normalizing distributions which may be applied when the scale on which 
the experimental results are measured is altogether arbitrary. If a 
smooth curve is drawn through the points of a random sample of observa¬ 
tions plotted against probits, the curve may be used to convert succeed¬ 
ing observations to a scale of probits. These transformed values are 
necessarily normally distributed. The validity of this procedure requires 
that the shape of the original curve and the variance of the transformed 
curve must be stable. An illustration of the application of this principle 
is given by Ferguson (Ref. 10) in his presentation of methods for the 
estimation of the limen and precision of separate items of a mental test. 
Finney (Ref. 11) applied the method of probit analysis to get the max¬ 
imum likelihood estimates of the two parameters from the data of 
Ferguson. 

The Logarithmic Transformation. It has been found that many 
moderately skew frequency distributions arising from empirical data or 
fulfilling certain theoretical conditions are reduced to normal curves when 
the original observations are transformed to X = log X. A logarithmic 
transformation of a variable may not only make the distribution more 
nearly normal but will often stabilize the standard deviation, that is, make 
it more or less independent of the original variable. This stabilizing 
tendency holds where it is found that the standard deviation of the 
original variable is roughly proportional to the mean, or where the vari¬ 
ance is proportional to the square of the mean. This fact makes the 
logarithmic transformation a powerful one. It has also been found 
useful in dealing with new material whose distribution is unknown 
(Ref. 6). 
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There is also the theoretical justification which indicates that the 
log transformation for most scientific observations is probably preferable 
to employing no transformation at all. The normal law may predict 
negative observations. The fact that there are men of more than double 
the average weight implies the existence of other men with negative 
weight. In case of scores of enlisted men on the Army Alpha Intelligence 
Examination, the measures M — 2 S.D. and M — 3 S.D. give the non¬ 
existent scores of —12 and —49. When logarithmic transformations 
of the observations are used, this difficulty does not occur. Measure¬ 
ments of the size of small bodies of the same shape may be based on the 
diameter or on the volume. If the distribution of the volumes is normal, 
that of the diameters will necessarily be skew, and vice versa. Again, 
the use of logarithms does away with the difficulty. If the logarithms 
of the diameters are distributed in a normal manner with a standard 
deviation X, the logarithms of the volumes will be normally distributed 
with standard deviation 3\ (Ref. 14). 

The logarithmic transformation, then, should make easy the interpre¬ 
tation of experimental results when the variations are large. It fre¬ 
quently has a double advantage in making experimental results more 
consistent and in preventing excessive weight from being given to an 
occasionally large aberrant observation. Cochran (Ref. 6) indicated 
that the logarithmic transformation made no significant difference when 
the coefficient of variation was less than 12 per cent. Natural logarithms, 
preferred by the mathematician, and common logarithms to the base 10, 
ordinarily liked better by the experimenter, give equally good results. 
Gaddum (Ref. 14) uses the symbol X to denote the standard deviation 
of the logarithm to the base 10. It is worth noting that as a logical 
consequence of the adoption of the method of logarithmic transformation, 
the mean of the logarithms (or the geometric mean of the observations, 
instead of the arithmetic mean) would be regarded as the most probable 
value. 

Gaddum (Ref. 14) gives general formulas for obtaining the mean and 
standard deviation of the transformed distribution when the original 
observations have been grouped on an arithmetic scale. These are 


X 



X 2 = 0.4343 



(7.17) 

(7.18) 


where X and <r are the mean and standard deviation, respectively, of the 
original distribution. Gaddum points out that these estimates are 
reasonably efficient only when X is less than 0.14 (Ref. 14), when an 
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estimate of X within 3 per cent can be obtained by dividing the coefficient 
of variation by 231. 

Gaddum proposed to call the distribution of x “log-normal” when the 
distribution of log x is normal. He reports a number of studies which 
show that the log-normal distributions have been found in many fields of 
work. It is also indicated that its use could have facilitated interpreta¬ 
tion of data in certain studies in which difficulties were encountered. In 
Wechsler’s study (Ref. 21), for instance, the curves obtained for many 
of the measurements of human traits were just the kind which are 
improved by using the logarithmic transformation. Gaddum calculated 
the values of X for some of Wechsler's data. For example, the estimated 
X*s for weight—0.045 and 0.055—are about three times the X’s for height 
—0.015, 0.0164, 0.0172, 0.017. 

Muhsam (Ref. 17) proposes the use of a “log-arith” grid for the study 
of relative dispersions of distributions. The log-arith grid is a system of 
rectangular coordinates in which the axis of abscissas is divided log¬ 
arithmically and that of the ordinates arithmetically. Generally, dis¬ 
tribution curves showing equal broadness on a log-arith grid have equal 
relative dispersions. A broader curve indicates higher relative dispersion 
while a narrower curve shows a lower one. This form of graphic repre¬ 
sentation is particularly suitable in the case of log-normal distributions. 

The Square Root and Inverse Sine Transformations. The present 
extensive use of the analysis of variance attaches special significance to 
the usefulness of transformations when there is reason to suspect that the 
theoretical conditions for the application of this technique are not ful¬ 
filled. These theoretical conditions are that the experimental errors to 
which the experimental data are subject are normally and independently 
distributed with the same variance. The logarithmic transformation 
just discussed equalizes the variance when it is proportional to the square 
of the mean. Therefore, this transformation is powerful for dealing with 
quantitative measurements, and it is used as a preparatory step to an 
analysis of variance when dealing with certain types of nonnormal data. 
The main objective in the use of this transformation is to ensure that the 
standard deviation, as calculated from a residual sum of squares, shall 
be applicable to the various “treatment” means, even though the means 
are different. The lack of normality of the distribution of the residual 
errors as observed in practice may be of secondary importance. Curtiss 
(Ref. 9) indicates that the logarithmic transformation may possibly be 
more successful in stabilizing the variance than in normalizing the data. 

A unit frequently used in expressing results of experimental or other 
observational data is the percentage, such as the proportion of the total 
number of observations which have a specified quality. Research work¬ 
ers have only recently considered the problem of including in the experi¬ 
mental designs for collecting this type of data an objective estimate of the 
experimental errors to which the data are subject. The analysis of 
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variance, uniquely fitted to serve this purpose, was not originally planned 
for use with percentages. The problem was one of discovering a trans¬ 
formation for the original observations which would satisfy the condition 
of normality of experimental errors required in the analysis of variance. 
The transformation used for this purpose is known as the inverse sine 
function , sin -1 y/x. 

The inverse sine transformation applies to fractions or percentages 
derived from the ratio of two small integers, when the experimental 
errors follow the binomial frequency distribution. Before an analysis of 
variance is performed, each percentage is changed to an angle 0 so that 
p = sin 2 0. As the fraction p varies from 0 to 1 or the observed per¬ 
centage, P, from 0 to 100, the angle 0 changes from 0 to 90 deg. In large 
samples, the sampling variation of P tends to be normally distributed 
with a variance dependent only on the number of observations on which 
the percentage is determined. The variance on the new scale is 821 /n. 
Fisher and Yates , in Tables XII and XIII of Ref. 13, provide tables for 
converting percentages and fractions to degrees. 

For the sampling distribution of the estimated percentages or pro¬ 
portions to be normal, the population value of p would be .50. For 
values of the parameters departing widely from .50, as between 0 and .25 
and between .75 and 1.00, the sampling distribution would be highly 
skew. For determining measures of sampling errors of such distributions, 
it is necessary to make a transformation of the observational values. The 
inverse sine transformation is the one used here. 

Likewise, for comparing the differences between percentages, par¬ 
ticularly where they deviate widely, as when one is in the tail and the 
other near the center of the distribution, the inverse sine transformation 
will render them more nearly comparable. Thus, the difference between 
two percentages Pi and P 2 would become 


and 



where N 1 and N 2 are the sizes of the samples. Then X = d/<r d is referred 
to the normal scale. 

Zubin (Ref. 22) has provided nomographs for the test of significance 
between two percentages transformed to the inverse sine function scale. 

When the observational data consist of small integers whose experi¬ 
mental errors follow the Poisson law, the square-root transformation, 
y = y/x, is used. This transformation is equivalent to the angular 
transformation at each end of the percentage scale, that is, from 0 to 20 
per cent and from 80 to 100 per cent. For a Poisson distribution with 
mean m, the standard error is the square root of m. Hence, if the treat¬ 
ments in an experiment bring about differences in the values of p and m, 
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they have different variances. With small whole numbers, treatment 
differences must be large before they can be significant. Moreover, the 
larger the treatment differences are, the greater the inequality in their 
variances is likely to be. 

The Poisson distribution is skew and hence there is a known relation 
between the standard error and the mean. The theoretical variance of 
the transformed values, is i. The purpose of the transformation 

is to change the data to a new scale in which the experimental variance 
is approximately the same for all plots, thus making possible the use of all 
in estimating the standard error of any treatment comparison. 

Normalizing Transformation for Ordinal or Ranked Data . In some 
types of experimental data, it may be possible or sufficient only to place a 
series of magnitudes in order of preference without knowledge of their 
metrical values. For example, in tests of psychological preferences* 
individuals may be able to express preferences but cannot assign numer¬ 
ical values to whatever forces may be operative in bringing about such 
preferences. Likewise, in the standardization of food products, an 
important factor is the determination of consumer preferences, which 
may be indicated by the ranking of a given set of products in order of 
choice. 

Where the assumptions underlying the order of ranking are fulfilled, 
namely, the assumptions that the underlying trait may be regarded as 
continuous and normally distributed, the transformation of ordinal data 
to a form that is amenable to further analysis (for instance, to the analysis 
of variance) sometimes may be definitely advantageous. The trans¬ 
formation needed is one which normalizes the data and can be obtained 
by assigning to each item in a series of given size a score equal to the 
expected value for an observation of corresponding rank in a normal 
population with zero mean and unit standard deviation. Tables have 
been prepared for series of all sizes from 2 to 50 items. Such a table is 
given by Fisher and Yates’s Table XX (Ref. 13). Table XXI in the 
same source provides the sum of squares for the transformed score of 
each individual, substantially reducing the labor involved in running the 
analysis of variance. This type of analysis makes possible tests of 
differentiation in preference between classes of subjects of different sex, 
age, or other characteristics. 

Bliss (Ref. 5) gives a complete description of the technique for trans¬ 
forming ranks and of its application to a problem of testing consumer 
preferences. Sandon (Ref. 19) has prepared a nomograph for the scoring 
of rank data on school examinations. 

Problems 

1. Set up a list of statistical tools that depend for their efficiency upon 
the fulfillment of the conditions of normality of the measurements of 
the trait or characteristic in the population sampled. 
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2. What are the effects of nonnormality on the validity of tests of signifi¬ 
cance—the z or F test, the two-tailed £-test, the one-tailed t-test? 

3. Test the hypothesis of normality of the following distribution of scores 
on the factual information test of the 1947 Minnesota State Board 
Examination in Biology administered in a representative sampling of 
56 Minnesota high schools (Anderson, 1949). Use the method of 
Pearson. 


Score 

Frequency 

Score 

Frequency 

25 

1 

12 

173 

24 

3 

11 

159 

23 

24 

10 

129 

22 

26 

9 

109 

21 

73 

8 

49 

20 

90 

7 

28 

19 

122 

6 

18 

18 

179 

5 

11 

17 

206 

4 

6 

16 

227 

3 

1 

15 

255 

2 

1 

14 

218 

1 

0 

13 

240 

Total 2,348 


4. Test the hypothesis of normality of the following distribution of 
first-quarter honor-point ratios of a random sample of 122 students 
in the College of Agriculture of the University of Minnesota. Use 
the criteria of Fisher. 


H.P.R. Frequency 


. 2.76 to 

3.00 

1 

2.51 to 

2.75 

2 

2.26 to 

2.50 

9 

2.01 to 

2.25 

3 

1.76 to 

2.00 

5 

1.51 to 

1.75 

8 

1.26 to 

1.50 

13 

1.01 to 

1.25 

14 

0.76 to 

1.00 

11 

0.51 to 

0.75 

14 

0.26 to 

0.50 

14 

0.00 to 

0.25 

11 

- 0.24 to 

0.00 

9 

—0.49 to 

-0.25 

3 

- 0.74 to 

-0.60 

2 

-0.99 to 

-0.75 

1 

-1.24 to 

-1.00 

2 

Total 


122 


6. Use the graphical method involving the use of probits for testing the 
normality of the distribution of honor-point ratios in Problem 4. 

References 

1. Aitken, A. C., Statistical Mathematics , 3d ed. London: Oliver & Boyd, 1944. 

2. Bartlett, M. S., “The Use of Transformations,” Biometrics t Vol. 3 (1) (1947), 

pp. 39-52. 



168 NORMAL AND NORMALIZED DISTRIBUTIONS [Chap. VII 


3. Bliss, C. I., “The Calculation of the Dosage-Mortality Curve,” Annals of 

Applied Biology , Vol. 22 (1935), pp. 134-167. 

4. -“The Methods of Probits,” Science , Vol. 79 (1934), pp. 38-39; 

409-410. 

5. - f “A Technique for Testing Consumer Preferences with Special 

Reference to the Constituents of Ice Cream.” Connecticut Agricultural 
Experiment Station Bulletin 251 (1943). 

6. Cochran, W. G., “Some Difficulties in the Statistical Analysis of Replicated 

Experiments,” Empire Journal of Experimental Agriculture, Vol. VI. 
No. 22 (1938), pp. 157-175. 

7. -, “The x 2 Correction for Continuity,” Iowa State College Journal of 

Science , Vol. XVI (1942), pp. 421-436. 

8. Cram6r, Harald, “Random Variables and Probability Distributions.” 

Cambridge Tracts in Mathematics No. 36, London: Cambridge University 
Press, 1937. 

9. Curtiss, J. H., “On Transformations Used in the Analysis of Variance,” 

Annals of Mathematical Statistics , Vol. XIV (1943), pp. 107-122. 

10. Ferguson, G. A., “Item Selection by the Constant Process,” Pyschometrika f 

Vol. 7 (1942), pp. 19-29. 

11. Finney, D. J., “The Application of Probit Analyses to the Results of Mental 

Tests,” Psychometrika , Vol. 9 (1944), pp. 31-39. 

12. Fisher, R. A., “Moments and Product-Moments of Sampling Distributions,” 

Proceedings of the London Mathematical Society , Vol. 30 (1928), pp. 199-238. 

13. -, and Yates, F., Statistical Tables for Biological, Agricultural and 

Medical Research. London: Oliver & Boyd, 1938. 

14. Gaddum, J. H., “Lognormal Distributions,” Nature, Vol. 156, No. 3964 

(1945), pp. 463-466. 

15. Johnson, Palmer O., and Tsao, Fei, “Testing a Certain Hypothesis Regarding 

Variances Affected by Means,” Journal of Experimental Education , Vol. 13 
(1945), pp. 145-149. 

16. McCall, W. A., How to Measure in Education. New York: The Macmillan 

Company, 1922. 

17. Muhsam, H. V., “Representation of Relative Variability on a Semilogarith- 

mic Grid,” Nature , Vol. 158, No. 4013 (1946), p. 453. 

18. Pearson, Karl (ed.), Tables for Statisticians and Biometridans, London: 

Biometrika , University College. 

19. Sandon, A., “Scores for Ranked Data in School Examination Practice,” 

Annals of Eugenics , Vol. XIII, Part 2 (1946), pp. 118-121. 

20. Stevens, W. L., “The Logarithmic Transformation,” Nalure, Vol. 158, No. 

4018 (1946), p. 622. 

21. Wechsler, D., The Range of Human Capacities. Baltimore: The Williams & 

Wilkins Company, 1935. 

22. Zubin, Joseph, “Nomographs for Determining the Significance of the 

Differences Between the Frequencies of Events in Two Contrasted Series 
or Groups,” Journal of the American Statistical Association, Vol. 24 (1939), 
pp. 539-544. 



CHAPTER VIII 


STATISTICAL ANALYSIS OF DATA UNDER NONNORMAL 

ASSUMPTIONS 

A type of data met with at times, particularly in psychology, consists 
of rankings which may arise from material not capable of quantitative 
measurement on a variate scale but arranged in order according to some 
qualitative characteristic. This might be, for example, the problem of 
arranging musical compositions in the order of preference by a group of 
students. Another problem consists in ranking according to two vari¬ 
ables: the arrangement of a set of musical compositions in the order of 
preference by a group of professional musicians and by a group of lay¬ 
men. The relationship between the two sets of rankings is of interest. 
Another type of data in this field would be produced by having a judge 
rate individuals on a five-point scale according to some trait. Trans¬ 
formations of these types of data are sometimes made. For example, the 
ranked data may be transformed into normally distributed data as 
described in Chapter VII. In another method the ranked data are 
distributed into groups, so that the frequencies in the various groups 
follow the normal scale. Scores on a linear scale are then assigned to the 
groups. Further statistical treatment usually follows, such as computing 
the product-moment correlation coefficient, using multivariate analysis 
or factor analysis. 

Before deciding to make such transformations, the critical investigator 
will examine his data and the conditions under which they were collected, 
to determine whether the assumptions underlying the transformation 
can be reasonably accepted. He may find that they cannot be and hence 
decide that a transformation is not warranted. There are a number of 
simpler statistical methods available which do not require the assump¬ 
tions of the more elaborate methods suggested above. They enable a 
direct attack to be made on the data. Some of these methods will now 
be pointed out, particularly those whose usefulness has been enhanced 
by the development of means for testing their significance. 

The Method of Rank Correlation. The rank-correlation method as 
developed by Spearman is well known. It is recommended, however, 
that the principal use prescribed for it in elementary texts on statistics 
be abandoned. This use consists in assuming that the Spearman’s rho 
may be used as a substitute for the product-moment correlation coefficient 
by the aid of tables usually given to obtain the product-moment equiv¬ 
alent. The formula due to K. Pearson (r = 2 sin gives the relation- 
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ship between the product-moment coefficient, r, and the rank correlation, 
p, when the variates are normal. The assumptions underlying the 
equivalence, that there are no ties in rank and that the intervals between 
successive ranks are equal, are not likely often to be found in practice. 
The use of the rank correlation given here is that as a test of significance. 

The Rank Correlation as a Test of Significance, Recent contributions 
to our knowledge of the rank correlation enable us to use it effectively 
as a test of the existence of correlation, that is, to test the hypothesis 
that the qualities under consideration are independent, or rather, that the 
judgments of them are independent (Ref. 4). Under such conditions, the 
pairs of rankings of n members drawn at random are independent. Thus, 
for large numbers of samples, every ranking of one quality will occur in 
equal frequencies with every ranking of another quality. If one ranking 
is fixed in the order (1, 2, , n), it may be correlated with the n\ 

possible permutations of these members. Thus, the exact probability 
that any correlation result could be due to random sampling errors can 
be calculated. 

This method of the calculation of probability values becomes laborious 
and practically prohibitive when n is of any substantial size. Olds 
(Ref. 10), however, has provided tables which give probability values 
to a close approximation. He tabled the probability values based upon 
the distributions of 2(d 2 ). The latter is simply related to r', the rank 
correlation, by the equation 


62 (d 2 ) 
n 3 - n 


( 8 . 01 ) 


The rank correlation is of special value in testing significance when 
there is no knowledge of the form of the bivariate distribution or in the 
case where the form of the distribution is, or is believed to be, non-normal. 
It should be pointed out that scarcely anything is known about the sig¬ 
nificance of rank correlation in correlated populations. 

Problem VHI.1. Testing the significance of rank correlation. An 
example is presented to illustrate the test of significance of a rank corre¬ 
lation coefficient by means of Olds’s Tables. 

The ranks R\ and Rt were assigned to 12 individuals with respect to 
two qualities with the results shown in the table at the top of page 171. 

We enter Table V (Ref. 10, page 148) with n = 12 and 2d 2 = 94; we 
find the probability of not exceeding 94 by chance is between .02 and .01. 
Therefore, we may conclude that there is a correlation between the two 
rankings. 

Problem VII.2. Combination of the information from two tests of 
significance. Another use to which the rank difference correlation may 
be put is the combination of rank and contingency methods suitable for 
utilizing simultaneously two* kinds of information contained in group 
data. Table 42, concerning first-year students entering one of the 
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Individual R i 

Ri 

d 

d 2 

A 

1 

1 

0 

0 

B 

2 

7 

5 

25 

C 

3 

7 

4 

16 

D 

4 

2 

- 2 

4 

E 

5 

4 

-1 

1 

F 

6 

9 

3 

9 

G 

7 

3 


16 

H 

8 

7 

-1 

1 

I 

9 

5 

-4 

16 

J 

10 

12 

2 

4 

K 

11 

10 

-1 

1 

L 

12 

11 

-1 

1 

Totals 


0 

94 


colleges of the University of Minnesota, gives the number of those who 
offered two credits in high-school mathematics and those who presented 
more than two credits in mathematics at the various levels of rating on the 
College Aptitude Test (C.A.T.). 


TABLE 42 

Freshman Students Classified according to College Aptitude Rating and the 
Number of Entrance Credits in High-School Mathematics 


Units of liigh-school 
mathematics 

College aptitude percentile rating 

1 

Total 

1-25 

26-50 

51-75 

76-100 

(a) Two years. 

67 

103 

176 

127 

475 

(b) More than two years. 

27 

25 

39 

20 

111 

Proportion (a) ( | (fa) . 

.719 

.805 

.819 

.864 

.812 

Rank on C.A.T. 

1 

2 

3 

4 


Rank of the proportion. 

1 

2 

3 

4 

r' = +1 


Two tests of significance, independent of each other, are applied to 
the data: the chi-square test, x 2 > and the rank-correlation coefficient, r'. 

The chi-square test of the. independence of the principles of classifica¬ 
tion gives the following results: x 2 = 8.118, P = .046. The rank-differ¬ 
ence correlation, /, between the two series, C.A.T. as one variable and 

the proportion —as the other variable, gives r' = +1, P = .042. 

CL "T" 0 

To test whether the aggregate of these two tests is significant, we have 
the following data: 
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Degrees of freedom 
2 
2 
4 

X 2 = 2(6.2492) = 12.4984; ~ .01 < P < .02. The probability of the 
hypothesis of independence of college aptitude rating and the number of 
units of high-school mathematics taken (two or more than two) is approxi¬ 
mately .014, by interpolation. Interpolation in the x 2 -table for 4 d.f.: 


p 

X 2 

logio P 

.02 

11.668 

2.30103 

.014 

12.498 

2.14600 

.01 

13.277 

2.00000 


Problem VIL3. Analysis of variation by the method of ranks. 

Friedman (Ref. 1) has developed the method of ranks which was designed 
to study variation by using ranked data instead of the original quanti- 

TABLE 43 

Ranks of Percentages of College Attendance for Specified Levels of College 
Aptitude and of Socioeconomic Status 


100 

90-99 

80-89 

70-79 

60-69 

50-59 

40-49 

15-39 

(a) Sum of ranks. 

(b) Mean rank... 

(c) Deviation.... 

(d) Deviation 

squared.... 


Theoretical mean =» 3.5. Sum of deviation squared =* 9.783376. 

tative values, avoiding the assumption of normality in the original data. 
The method can also be used where the available data relate to order 
only or to a qualitative character capable only of being ranked. This 


College 

aptitude 

intervals 


Ranks based on percentage of college attendance by 
socioeconomic status 


Below 15 15-18 


6 

5 

6 
6 
6 

5.5 

2 

4 

40.5 

5.063 

1.563 



27-30 Above 30 


2 

4 

4 

4 

4 

5.5 

5 

6 

34.5 

4.313 

0.813 


2 

1 

5 

2 

3 

_ 2 _ 

21.5 

2.686 

-0.812 


4.5 

3 

3 

5 
2 

4 

6 

5 

32.5 

4.063 

0.563 


30.0 

3.750 

0.250 


2.442969 0.660969 0.669344 0.316969 0.0625 5.640625 


P — log. P 

.046 3.0791 

.042 3.1701 

Total 6.2492 
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method makes use of the statistic x?, which is related to Kendall's coeffi¬ 
cient of concordance W (see page 174) as follows: 

X? = m(n - 1 )W (8.02) 

The distribution of xl tends to approach the distributed x 2 , as w tends to 
infinity, with (n — 1) degrees of freedom. Some significance levels of 
X? have been provided (Ref. 1). 

The example given in Table 43 shows the procedure of the method of 
ranks. The data are given by Schultz (Ref. 13). 

(1) The ranks were obtained by arranging in ascending order the 
percentages of male high-school graduates for each row (the 
college aptitude levels). 

(2) The next step was to obtain the mean rank for each column 
given in line (b). 

(3) The third step was to obtain the difference between the mean 
rank for each column and the theoretical mean 3.5, i.e., £(p + 1), 
where p is the number of ranks. 

(4) The sum of squares of the differences in (3) was obtained. 

(5) Then x 2 was found as follows: 

p w 

% (£ '«)’ - s »<!> + » W 

;=1 *-l 

where r, 7 is the rank entered in the ith row and thejth column; n 
is the number of ranks averaged. Thus: 

X? = ^yy (9.783376) = 22.365 

(6) The x 2 -table is entered with 5 degrees of freedom. 

(7) Since P is less than .01, it was inferred that there was a significant 
association between socioeconomic status and college attendance 
where college ability was controlled. 

Wallis (Ref. 14) gives a formula for calculating the statistic, r\ ry the 
rank correlation ratio: 

= p(p + l)x 2 /12 = x 2 
Vr np(p 2 - 1)/12 n(p - 1) 

from which 17 * = = -5591 (8.04) 

and if, = .75 

Finally, the value of .75 is an estimate of the rank correlation ratio 
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between socioeconomic status and percentage of college attendance when 
college aptitude is controlled. 

The Case of Multiple Rankings. The problem arises in practice of 
how to determine the agreement among a number of rankings and how to 
obtain an estimate of a true ranking if a significant concordance among 
sets of rankings exists. This is the case when there are m rankings of n 
instead of two. For instance, a group of students might be asked to 
arrange the photographs of a number of persons unknown to them with 
respect to their judgments as to the unknown persons’ intelligence. It 
is desired to test whether there is a community of judgments between the 
students. Of course this experiment is not equivalent to determining a 
relationship based on order of experimental findings. There could be a 
substantial agreement about an incorrect order which might be different 
from the one established by the score of a valid and reliable intelligence 
test. 

Problem Vm.3. Computing and testing the significance of the 
coefficient of concordance. Let the following represent the rankings of 
three observers of 8 objects, Ai, . . . , A s : 


Objects 


Observer 

B 

A 2 

B 

B 

B 

B 

D 

B 

1 

7 

4 

2 

6 


3 

1 

8 

2 

4 

2 

1 

7 

6 

3 

5 

8 

3 

7 

2 

1 

6 

4 

5 

3 

8 

Sum of ranks 

18 

8 

■■ 

19 

15 

11 

9 

24 


The sum of the sum of the ranks of the columns must be 108, that is, 

Bm(« + 1) w here m is the number of observers and n the number of 

objects. If the concordance were perfect the sums would be 3, 6, 9, 12, 
15, 18, 21, and 24, though not necessarily in that order. If there is 
little or no agreement, the sums are approximately equal. The variance 
of these sums gives a measure of the ranking concordance. 

Kendall (Reference 9, page 411) derives a coefficient of concordance, 

W, as 


W = 


12g 

m 2 (n 3 — n) 


(8.05) 


where 8 is the sum of the squares of deviations from the mean, m(n + l)/2. 
If agreement is perfect, then the sums of the columns are m, 2m, . . . , 
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nm and the sum or S is m 2 (n 3 — n)/ 12. The range in values of IF is 
from 0 to 1. 

In the example above, 


Mean = m{n + l) - = ?(8_+l) = ^ 

S = (18 - 13.5) 2 + (8 - 13.5) 2 + (4 - 13.5) 2 + (19 - 13.5) 2 

+ (15 - 13.5) 2 + (11 - 13.5) 2 + (19 - 13.5) 2 + (24 - 13.5) 2 
= 320.00 
12(320) 

9(8 3 - 8) 

= .85 


To test the significance of an observed value of IF, it is essential to 
determine the distribution of IF (or, more conveniently, of S) in the 
population, which is obtained by permuting the n ranks in all possible 
ways in each of the to rankings. Kendall (Ref. 6) gives the distribution 
for some low values of n and to and indicates how to approximate for large 
values through the use of a continuous distribution. The latter can be 
done by the use of the z-distribution where 


and 


2 = -| log, 
v\ = (n — 


(to - \)W 
1 - IF 



Vl = (to — 1 ) 



1 



(8.06) 

(8.07) 

(8.08) 


In making this test for low values of to and n, it is desirable to apply the 
usual correction for continuity by reducing S in Equation (8.05) by unity 
and increasing the divisor by 2. 

We shall illustrate by testing the significance of the obtained value, 
Wo = .85: 


jy/ (320 1 ) 

r ® _|_ 2 




*>1 = J F -, V 2 = V 

For Vi = 6 and vt = 13: z.ooi = 1.0306. 

Hence, for z = 1.1759 P < .001. 


The estimate of the true ranking of the objects is intuitively given by 
taking as rank 1 that object whose sum of ranks is the least. In our 
problem that object is A g , followed by objects A 2 , A?, A«, A 6 , Ai, A 4 , and 
A$. This ranking is obtained by rearranging the 8 totals in rank order. 
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This solution is given a firmer theoretical basis by showing that it is 
“best” in a least-squares sense. If any two of the S’s are equal, this 
method is indeterminate, and priority would be given to the object 
which has the lesser sum of squares of ranks. Where two objects have 
the same set of ranks, the specific ranking of each can be decided by 
tossing a coin or by selecting the ranks in a way most unfavorable to the 
hypothesis under test. An alternative solution might be obtained by 
splitting the ranks, giving each of the doubtful objects the same rank. 
This method, however, introduces severe theoretical difficulties in making 
tests of significance. 

The Method of Paired Comparisons. In the method of paired com¬ 
parison, the observer compares each object with every other one. He 
indicates which object in a pair he prefers. This method was developed 
in psychology in the late 1890’s. Its use, however, was limited to that 
of a descriptive statistic. Recently, statistical methods have been 
developed for testing the consistency of an individual’s comparisons 
and also of the agreement between observers or judges. These develop¬ 
ments should enhance the value of the method for research purposes, 
particularly for the situations for which it has a unique value. In rank¬ 
ing, for example, if the quality under consideration is not measurable 
on a linear scale, the resulting ranking may give not only a faulty presen¬ 
tation of an observer’s preference but also of the variation of the quality 
in the individuals. Thus in judging preferences in musical composition 
it is not unlikely that an auditor would judge A as preferable to B, B to C, 
and C to A. “Inconsistent” preferences of this kind could not occur in 
ranking, since, if A is placed above B, and B above C, then A is auto¬ 
matically placed above C. Cases also arise in which the judgments of 
untrained individuals are wanted who might be capable of comparing 
pairs of individuals with respect to some quality but would not likely be 
able to rank all the members of even a relatively small group. In animal 
experiments or in experiments with very young children, for example, in 
determining food choices, rankings would not be possible. But paired 
comparisons could be used by presenting the food in pairs and noting 
which food was eaten first. 

Coefficient of Consistence in Paired Comparisons . Kendall (Refs. 6 
and 8) gives a method of deriving a coefficient of consistence which 
indicates how consistent a judge or observer is in making preferences. 
If an individual observer produces a configuration of inconsistent prefer¬ 
ences, the reasons may be that (1) he is incompetent to judge, (2) the 
differences among the objects may be too small to detect, (3) the attention 
of the observer may wander during the experiment, (4) the quality under 
comparison may not be representable by a linear variable. 


With n objects, each of the possible pairs, 



, is presented to the 
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subject and his preference of one member of the pair is noted. If the 
object A is preferred to B, it may be indicated as A —* B. In general, if 
an observer makes preferences of the type A—► B —> C —»D —> E —> F 
. . . there is no inconsistency, and this case corresponds to ordinary 
ranking. The criterion of inconsistence is the “circular” triad. If the 
n objects are considered as the vertices of a regular polygon of n sides 
and each vertex is joined with every other one, the direction of the choice 
can be indicated. Thus, if A is preferred to B, the symbol in the diagram 
is A —> B. Any triangle in the figure in which the arrows all point in the 
same direction is a “circular” triad. Thus, if an observer makes pref¬ 
erences of type A —> B —► C —> A, the triad ABC is said to be inconsistent. 

Kendall (Ref. 6) proved that the maximum possible number of 


circular triads is 


n 


3 _ 


n 


if n is odd and 


7i 3 — 4n . 


if n is even; the smallest 


a 14. / V 1U V/U.VA M/HVA n A 

24 24 

number is zero. If d is the number of circular triads in an observed 
configuration of preferences, he defines f, the coefficient of consistence, as 


r = i - 

f = i - 


24d 

n 3 — n 
24 d 

n 3 — 4n 


(n odd) 
(ft even) 


(8.09) 


From these equations, it is observed that f is unity when there are no 
inconsistencies in the configuration. As the coefficient decreases to zero, 
the inconsistence, as determined by the number of circular triads, increases. 

The next problem is to determine the statistical significance of f, that 
is, to answer the question: With what probability can an obtained value 
of f arise by chance if the judge assigns his preferences at random in 
relation to the quality under examination? 

With n objects, the number of possible configurations of preferences 

(n\ 

is 2' 2 '. Kendall discusses the procedure of investigating the distribution 

( n ) 

of d in this population of 2^ 2 ' different members, namely, the method of 
proceeding from the distribution of n to that for {n + 1). He gives 
tables with the frequencies and probabilities for the distribution of d 
for n up to and including 7. 

Coefficient of Agreement for m Observers. Kendall (Ref. 6) derived 
a coefficient of agreement in which the judgments of m observers are 
obtained by the method of paired comparisons. The coefficient u is 
given by 



( 8 . 10 ) 
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where m = the number of observers, n = the number of objects judged, 



i-m 


where y is the number in each cell. 

The coefficient of agreement, u ) is unity if and only if there is unani¬ 
mous agreement in the comparisons. Its minimum value is —1 only 
when m = 2. Kendall gives tables which enable one to make an exact 
test of significance of u for the following values of m and n: m = 3, n = 2 
to 8; rrt = 4, n = 2 to 6; m = 5, n — 2 to 5; m = 6, n = 2 to 4. He has 
also demonstrated that the x^approximation provides an adequate test, 
of significance for values of m and n outside the range of the tables. 
The expression 


V 1 at ( m \ m — 3 

L2/ - S 


J 

is distributed as x 2 where N 



4 


m — 2 


( 8 . 12 ) 


v = 1) degrees of freedom (8.13) 


Problem VIII.4. Calculating the coefficient of agreement. A class 
of 67 ninth-grade boys were asked to state their preferences with respect 


TABLE 44 

Preferences of 67 Ninth-Grade Boys in 9 School Subjects* 


Subject 

1 



4 


6 

7 

8 


Totals 

1. Physical Education. 


41 

55 

56 

58 

56 

58 

57 

62 

443 

2. Industrial Arts. 

26 


57 

55 

57 

56 

54 

60 

63 

428 

3. Literature. 

12 

io 


28 

36 

38 

36 

40 

60 

260 

4. Mathematics. 

11 

12 

39 


29 

34 

40 

37 

51 

253 

5. Social Studies. 

9 

10 

31 

38 


34 

40 

40 

51 

253 

6. Science. 

11 

11 

29 

33 

33 


36 

43 

53 

249 

7. Spelling. 

9 

13 

31 

27 

27 

31 


34 

48 

220 

8. Art. 

10 

7 

27 

30 

27 

24 

33 


47 

205 

9. Composition. 

5 

4 

7 

16 

16 

14 

19 

20 


101 


Total 241 2 


* This table is read by considering the subject at the left of each row as being pre¬ 
ferred y times over the subject at the top of the column which locates any particular 
sguare, where y is the number in that square. For example, Physical Education is 
preferred by 41 boys over Industrial Arts. 
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to 9 school subjects. Each boy was asked to place an X in front of the 
one member of each of the 36 pairs of subjects which interested him more 
when he studied it. The preferences are shown in Table 44. The 
problem is to determine the similarity of preferences among the boys. 
The measure of agreement is the coefficient of agreement as given in 
Equation (8.10). 

The calculations required are as follows: The calculation of Y as 

3 

given by Equation (8.11) can be shortened when the objects are arranged 
in order of total number of preferences by using the following relation: 


2=2 



(8.14) 


where the summation is now carried out over the half of the table below 
the diagonal. The numbers in this half being smaller than those in the 
other half, the arithmetic is simpler. 


Hence, 


S(7 2 ) 

27 


m2 (7) 





W + 12 8 • • • 20 2 = 17,914 
26 + 12 • • • 20 = 712 
(67) (712) = 47,704 


67 X 66 
2 


= 2211 


9X8 

2 


= 36 


(2211) (36) = 79,596 


17,914 - 47,704 - 79,596 = 49,806 


2 X 49,806 



- 1 = 0.2515 


To test the significance of u, we calculate x 2 according to Equation 
(8.12). Thus: 


49,806 - ^ 



67 - 3" 
67 - 2 



653.57 


_ (®) 67 < 66 > 

(67 - 2) 8 


is distributed as x 2 with 
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or 37.7 degrees of freedom. The large value of v justifies the use of the 
normal approximation to the x 2 -distribution. Then 


\/(2x 2 ) - V2v - 1 = 4.2 


This is a highly improbable result on the hypothesis of a random assign¬ 
ment of preferences. Therefore, the coefficient 0.2515 is statistically 
significant. It may be concluded that there is a certain amount of agree¬ 
ment, though not a strong one, among the boys in their preferences for 
school subjects. 

Problem VIIL5. Measuring the consistency of choices by use of 
paired comparisons. The distribution of circular triads of a random 
sample of 15 ninth-grade boys and the coefficients of consistence for 
preference in school subjects calculated from Equation (8.09) were as 
follows: 


Student Number 

d 

r 

1 

0 

1.000 

2 

0 

1.000 

3 

0 

1.000 

4 

0 

1.000 

5 

0 

1.000 

6 

0 

1.000 

7 

0 

1.000 

8 

0 

1.000 

9 

1 

0.9G7 

10 

1 

0.967 

11 

1 

0.967 

12 

1 

0.967 

13 

3 

0.867 

14 

3 

0.867 

15 

8 

0.733 


For 8 of the boys, there were 

cients were 1,000; that is, f = 

7, there were 4 coefficients 
0.733. 


no circular triads. Therefore, the coeffi- 

1 — y29^— 9 = ^ or remaining 

of value 0.967, 2 of 0.867, and 1 of 


It may be concluded that these students were able to give a consistent 
set of choices of school subjects by use of paired comparisons. The 
reader is invited to validate these conclusions by making the appropriate 
tests of significance. 


Problems 

1. Before an examination, a teacher ranked her class of 25 students 
according to their expected achievements. After the examination, 
the rank was determined according to total score. What can be said 
about the teacher’s estimation of the abilities of the students? 
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Student Teach " r ’ s 
rank 


Exami¬ 

nation 

rank 


a 

1 

5 

b 

2 

1 

0 

3 

9.5 

d 

4 

22 

0 

5 

4 

i 

6 

16.5 

g 

7 

11.5 

h 

8 

19 

i 

9 

9.5 

j 

10 

21 

k 

11 

7.5 

1 

12 

24 

m 

13 

14 

n 

14 

7.5 

0 

15 

2 

P 

16 

3 

q 

17 

25 

r 

18 

6 

s 

19 

16.5 

t 

20 

15 

u 

21 

20 

V 

22 

23 

w 

23 

13 

X 

24 

18 

y 

25 

11.5 


2. Combine the information from two tests of significance, the chi-square 
test, and the rank correlation coefficient applied to the data in Problem 
11, Chapter V, page 100. 

3. The following tabulation represents the rankings of 5 students based 
on their preferences for four different musical compositions: 


Student 

Composition 

Ai 

A 2 

^3 

A t 

1 

1 

3 

4 

2 

2 

2 

1 

3 

4 

3 

2 

3 

4 

1 

4 

2 

1 

4 

3 

5 

3 

1 

2 

4 
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(a) Compute the coefficient of concordance and test its significance. 

(b) If a significant concordance among the sets of rankings is found, 
combine the rankings to obtain the estimate of the true ranking. 

4 . The following data represent the rankings according to interests in 
high-school subjects of a random sample of 28 boys in the eleventh 
grade. The rankings were obtained by three different methods: (1) 
paired comparison, (2) order of merit, and (3) rating. 


Ranks of subjects 


Method 

Phys. 

Ed. 

Ind. 

Arts 

Lit. 

■ 

Math. 

Soc. 

Sci. 

i 

Sci. 

Spell. 

Art 

Comp. 

Paired comparison. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Order of merit. 

1 

2 

4.5 

4.5 

3 

6 

7 

8 

9 

Rating. 

2 

1 

4 

6 

3 

5 

8 

7 

9 


(a) Test the significance of the difference in ranks by the three 
methods. 

(b) If a significant association is found, estimate the amount of 
association among the three methods. 

5 . The following tabulation shows the preferences of 67 ninth-grade girls 
in 9 school subjects: 


Subject 


B 

n 

m 

5 

6 

1 


9 

Totals 

1. Literature. 


m 



45 

48 

m 


60 


2. Home Economics. 

34 


38 

38 

41 

48 

1 50 

50 

59 


3. Physical Education. 

26 

29 


28 

34 

40 

46 

1 53 

58 


4 . Spelling. 

26 

29 

39 


34 

38 

45 

46 

48 


5 . Mathematics. 

22 

26 

33 

33 


39 

45 

45 

41 

284 

6 . Art. 

19 

19 

27 

29 

28 


42 

43 

44 

251 

7 . Social Studies. 

16 

17 

21 

22 

22 

25 


36 

36 

195 

8 . Composition. 

11 

17 

14 

21 

22 

24 

3i 


32 

172 

9 . Science. 

7 

8 

9 

19 

26 

23 

31 

35 


158 


Total 2412 


(a) Compute the coefficient of agreement u. 

(b) Test the significance of u. 

(c) Compare the value of u for girls with the value of u for boys from 
the same school given in Table 44. 

6 . Construct, administer, and analyze the results from a test designed to 
measure the attitude and its intensity of a specified population toward 
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some pressing educational issue. (Consult Guttman, Louis, and 
Suchman, Edward A., “Intensity and a Zero Point for Attitude 
Analysis,” American Sociological Review , Vol. 12 (1947), pp. 58-67.) 
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CHAPTER IX 


SAMPLING THEORY AND PRACTICE 

We shall now attempt to make available to the reader some of the 
results from investigations about sampling from the point of view of their 
use in the construction of clearer, more concise, and better-organized 
designs of sampling surveys and experiments. It is expected that the 
reader will become able to extend and deepen his knowledge of sampling 
principles by further reading of the more technical accounts and to apply 
his knowledge to the particular scientific problems in which he is inter¬ 
ested. Although our chief interest here is in the empirical or observa¬ 
tional parts of applied statistical science, the theoretical part previously 
developed is basic. Here, as elsewhere in science, both the theoretical 
and empirical parts are essential: the progress of a science is dependent 
on their reciprocal influence and simultaneous advancement. 

The theoretical part of science is, presumably, based on exact ascer¬ 
tainments, and its purpose is to develop the structure, relationships, and 
results of hypotheses. The appropriateness and applicability of a con¬ 
ceptual model involve the confirmation or refutation by observation 
of the hypotheses which enter into the model. The hypotheses must be 
changed if they are not supported by experience and observation. An 
adequate scientific methodology evolves through comparisons and evalua¬ 
tions of scientific theories, both from the standpoint of their essential 
parts and their efficiency in practice. The more explicit the theory is, 
the more amenable it becomes to the detection of errors or deficiencies 
that it may possess. 

Observation is the basic process of empirical science. The empirical 
side of science obtains, criticizes, and systematizes the observations. 
It unites the observations with the theoretical propositions and in this 
process may reject the hypotheses of the theory, if found necessary. It 
should be remembered, however, that the empirical side of science is also 
directed by hypotheses. The specification of the conditions under 
which the observations are to be made and the form in which they are 
to be collected are governed or guided by theory. Within the reciprocal 
relationships, it is probably mutually advantageous that the speculative 
and the observational sides of science should work somewhat inde¬ 
pendently, each by its own special method. 

Although statistical theory has been concerned chiefly with random 
sampling, considerable resourcefulness, based perhaps chiefly on common 
sense and intuition, has resulted in the development of new and effective 
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systems of sampling designs. Much study is being given to the develop¬ 
ment of needed statistical theory basic to estimating the relative efficiency 
of different systems of sampling. Sampling is an excellent illustration 
of the link between theory and practice and of how difficulties are dis¬ 
covered and resolved as they arise in the problems met with in experience. 

From an early date, governments have engaged in the collection of 
statistics of population, commerce, production, consumption, prices, 
wages, income, and, more recently, with problems of social need and 
human welfare. Hence, statistics was originally political arithmetic to 
a great extent. The standard method for the collection of these statistics 
has been complete coverage and enumeration, of which the classical 
example is the population census. Theoretically, at least for those 
population characteristics which remain relatively constant, this pro¬ 
cedure appears to be the best. But such an undertaking is costly, 
difficult to plan and conduct, limited to a relatively few items of informa¬ 
tion, is time-consuming, and is liable to be out of date by the time the 
results are published. In fact, the government even with its great 
resources and facilities, can carry on complete censuses only at rather long 
intervals. The exigencies of the World War II required the collection 
of many types of data which could only be done by the use of sample 
surveys. It is also worth noting that other governmental investigations 
had at various times resorted to sampling. In the 1940 census, for 
instance, the Bureau of the Census was able to broaden the scope of its 
inquiries by including a set of supplementary questions which were 
answered by a sample of 1 person in 20. Special sampling surveys for 
securing statistical information are now often made by unofficial agencies 
and by private individuals, usually to provide the lacking official statistics. 

In recent years we have witnessed the extension of sampling meth¬ 
ods to a great diversity of situations and for a variety of purposes, for 
example: 

(1) To find out the most efficient particular pattern and location 
of the observations in an experiment in physics. 

(2) To sample a growing crop for studies in plant physiology, 
agricultural meteorology, and others. 

(3) To estimate the amount of acreage devoted to a particular crop 
or to forecast the expected yields from the economically more 
important crops. 

(4) To investigate the nature and extent of economic and social 
problems, such as unemployment, housing, delinquency, and 
crime. 

(5) To discover the factors influencing consumers , demands. 

(6) To measure public opinion on political, economic, and similar 
problems; to detect the effects of propaganda. 
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(7) To determine the frequency distribution of the length of sen¬ 
tences or other factors to characterize the styles of various 
authors. 

(8) To investigate local government by examining the local laws 
in a few selected years over a 200-year period. 

(9) To study methods of technical control in the manufacture of 
technical products. 

(10) To ascertain the location and frequency of individuals having 
special talents, such as persons able to withstand the rigors of 
dive-bombing, or individuals with certain types of color-blind¬ 
ness that make them valuable as observers who can detect 
camouflage. 

Most of these investigations would probably be impossible from the 
standpoint of expense, time, and utility of findings if it were necessary, 
to investigate the whole field of inquiry in any detail. Furthermore, some 
investigations require destructive tests; hence, there would be no point 
to the investigation if the destruction of the whole were essential. 

Sampling, of course, is an everyday affair. From time immemorial 
it has played an essential role in carrying out common human activities. 
Primitive man who sampled food before he gave it to his children relied 
on the statistical principle of sampling without knowing that he did so 
or that such principles exist. The modern housewife relies on the quality 
of the sample before she purchases in quantity. 

Probably because of the rapidly increasing use of sampling in experi¬ 
mentation and in survey studies, rapid development is taking place in 
the theory and design of sampling investigations. 

Sampling Designs. The planning of sampling designs is usually 
involved in two situations: extensive survey studies, descriptive or 
analytical; and experimental investigations, which are more restrictive. 
In both situations the sampling problem is that of securing accurate and 
representative samples. A representative sample is one in which the 
measurements made on its units are equivalent to those which would be 
obtained by measuring all the elements of the population, except for the 
inaccuracy due to the limited size of the sample. 

The principal questions which relate to the setting out of an investiga¬ 
tion by sample are 

(1) What is the best size of the sampling units? 

(2) What number of sampling units should be used to secure the 
desired degree of precision in the estimates to be made? 

(3) What system of sampling will secure the optimum allocation of 
the sampling units among the population or its subdivision? 

Population. To answer these questions, certain assumptions about 
the unknown population must be made. It is fundamental to use meth- 
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ods of sampling and of estimation that are based on a minimum of 
unavoidable assumptions and that also make unambiguous their exact 
implications. It may be stated in advance that there is not one faultless 
method of sampling. The method to be used is contingent upon the 
nature of the material available and obtainable for the particular problem 
under investigation. 

In practice, most populations are finite in character; the universe is 
comprised of a finite number of members. The conditions of an infinite 
universe, one which contains an infinite number of members, is assumed 
to be fulfilled in practice by sampling with replacement. A large part of 
statistical theory is also built on the assumption that the universe is 
continuous, that the members or some measurable variable make up a 
continuous set. 

A population is called existent if all members can be enumerated or if 
the members can be designated by a law of formation. For instance, 
the inhabitants of the United States and the universe of positive integers 
are existent universes. In cards and dice games and roulette, potential 
universes consist of the millions of combinations of 52 cards, of the 
millions of throws of a six-sided die, and the millions of turns of a roulette 
wheel with its 37 numbers. These need only be imagined as hypothetical 
universes. Likewise, a population of experiments is a hypothetical 
universe. 

The usual practice of the statistician is to refer to the bulk that is 
being sampled as the population , the universe , or the supply. The choice 
of a population or universe is a necessary first step in an investigation 
based on samples. The definition of the population to be covered in the 
investigation is an integral part of the statement of the purpose of the 
study. 

Randomness. The concept of randomness is fundamental in sampling 
theory and practice, but it is rarely if ever defined, except perhaps in 
mathematical language of which the following is illustrative: “A sequence 
of variates xi, . . . , x n is said to be a random series, or to satisfy the 
condition of randomness, if xi, . . . , x n are independently distributed 
with the same distribution; i.e., if the joint cumulative distribution func¬ 
tion (c.d.f.) of Xi y , x n is given by the product F(x i) . . . F(x n ) 
where F{x ) may be any c.d.f.” (Ref. 10). 

Restricted to the written word, the condition of randomness seems 
to be based on certain intuitive principles which give practical results. 
Randomness is a fundamental idea in connection with the selection of 
values of a variate from a population. The principle is implied in the 
criterion of random sampling that every member of the population 
should have an equal and independent chance of being included in the 
sample. 

Tests of randomness are of the greatest significance, since statistical 
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inference is strictly valid only for random samples. It is also a matter of 
great practical and scientific importance to determine whether the fluctu¬ 
ations manifested by a series of observations are random in character or 
whether they may be assumed to be the outcome of some factor operating 
under a definite law. 

Testing for randomness is an important problem in quality control 
of manufactured products and also of special importance in the analysis 
of time series. The need for such tests has resulted in considerable 
research for criteria of randomness. 

Bias . If a sample has been chosen from a population in such a way 
as not to be a random sample, then no valid estimate can be made from 
it of a population parameter. 1 If a sample has been selected by a random 
method, it gives a result that progressively approaches the population 
value as the sample is increased in size, assuming that an unbiased method 
of estimation has been used. If the results obtained are too high or too 
low, then the sample is called biased . The difference between the value 
determined by a very large sample and the parameter or population 
value is termed an error of bias. 

Errors of bias follow no known laws by which their amount might be 
estimated. Errors of bias are incorporated, therefore, with random 
errors and may thus result in spurious estimates of the latter. In sam¬ 
pling designs every caution is necessary to avoid errors of bias. Even 
if an efficient method of sampling has been used, errors of bias may 
arise in a number of ways. For instance, biases have been observed 
in sampling surveys of households where nobody was found at home 
when the interviewer called for the first time. The smaller the 
family, the smaller are the odds that some one will be at home. Unless 
the visits are continued until complete enumeration is obtained, errors of 
bias will arise in connection with size of families and other characteristics 
associated with it. Other instances of bias in sample surveys may be 
traced to factors such as bias and irregularity in the interviewer, imperfec¬ 
tions in the design of the questionnaire, and errors arising from non¬ 
response on the part of the interviewee. 

A classical example of bias arising from an unrepresentative selection 
of respondents and from the erroneous belief that a large sample could 
overcome such an error is furnished by the attempt of The Literary 
Digest in 1936 to predict the results of the Presidential election. Approxi¬ 
mately ten million post cards were mailed to people whose names were 
listed in telephone directories and in files of owners of automobiles. Of 
the 2,350,176 replies received, only 40.4 per cent were in favor of Franklin 
D. Roosevelt for President. In the election, he received 60.7 per cent 
of the votes cast. The error of bias was, therefore, approximately 20 per 


1 In systematic sampling, for instance in stratified sampling, the number of ele¬ 
ments to be selected from any stratum must be selected at random. 
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cent. The sample was biased in that the respondents did not constitute 
a random sample of those citizens who voted in this election. 

Questionnaire studies in which the sample selects itself, voluntary 
replies to requests for opinions on some controversial issue, and letters 
written to editors of newspapers—all are likely to represent mainly 
persons who have strong views on the issues one way or another. 

Systems of Sampling 

The origin of the sampling problem is in the necessity of estimating 
certain characteristics of a population usually so large that, it is practically 
impossible to examine every member of the population, or so large that 
the time and cost required to do so would prohibit the undertaking. 
In this undertaking, it is essential to consider how best to take the sample 
and to obtain the estimates, and with what precision the estimates have 
been made. The fundamental statistical problem is, therefore, that of 
estimation. 

Unrestricted Random Sampling. A particularly simple form of 
sampling technique is illustrated by the classical urn problem. By 
counting the number of balls of each color in the sample drawn from the 
urn, the relative proportion of balls of different colors in the sample is 
determined. From these proportions the color composition of the balls 
in the urn is inferred. By using the properties of the familiar binomial 
or multinomial distributions, the margin of error of the estimate can also 
be calculated. 

An analogous situation in principle might be the estimation of the 
occupational classification of the from 16 to 17 millions of men, tw'enty- 
one to thirty-six years of age who in 1940 registered in accordance with 
the Selective Service Act. Let us assume that each individual had a 
registration number which was written on a paper and enclosed in a 
separate capsule and that all capsules were placed in a container utilizing 
compressed air to secure a constant rotation. One thousand capsules 
would be drawn at random and the corresponding occupations ascer¬ 
tained. In order that statistical principles might be used in a valid way, 
it is fundamental that each member of the sample should be chosen 
strictly at random, which means a method of selection by which each 
member of the population has an equal and independent chance of being 
included in the sample, and that the method of selection is completely 
independent of the characteristics to be examined. This is the method 
of purely random sampling, sometimes called unrestricted or the unitary 
unrestricted type of sampling. This method is regarded as being capable 
of giving the most accurate results in cases where the elements of the 
statistical population have equal chances of inclusion and where there is 
no prior knowledge of the population sampled to provide a basis for 
selecting individuals. 
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Systematic Sampling Methods. In contrast to the method of simple 
random sampling, a number of methods have been developed which 
may be called systematic methods . These methods utilize prior knowledge 
of the individuals comprising a universe with the view to increasing 
accuracy and representation of samples. They generally use more 
complex forms of random sampling called representative sampling . 

Stratification . One of these systematic methods is based on the use 
of knowledge of population characteristics, first to divide the population 
into more homogeneous groups or strata and then to select at random 
the sampling units from each of these groups. This method has been 
called restrictive random sampling or the method of stratification. It is in 
effect a weighted combination of random subsamples. Various prin¬ 
ciples have been used to distribute the sampling units among the several 
strata. One, called stratified proportionate sampling , is based on the 
distribution of sampling units purely proportional to the total number of 
units in each stratum. In simple random sampling this proportion is 
left to chance. Another basis is to take the number of sampling units 
per stratum proportional to the product of the number of sampling 
units in the stratum by their standard deviations. 

Stratified sampling is used in the Gallup polls of public opinion in 
order to secure representative proportions of various classes of people 
rather than to rely on the chance determination of these proportions. 
In the interviews that are made, each subject supplies sufficient informa¬ 
tion about himself to permit classification according to (1) part of the 
country, (2) the urban or rural district, (3) socioeconomic status, (4) 
political affiliation, (5) age, (6) sex. The particular type of stratification 
used depends on the problem under inquiry. 

While some progress has been made, the methods in use for predicting 
elections are not yet scientific. Among other hazards, the sample design 
may reflect erroneous judgments as to the factors (used for controls in 
stratification) truly associated with the characteristic under investigation. 
Serious biases may also be introduced because the selection of the sam¬ 
pling units within a stratum to be interviewed is not done at random, mak¬ 
ing it impossible to obtain an unbiased measure of sampling error from the 
internal evidence of the responses themselves. Furthermore, the 
population composed of eligible citizens who subsequently go to the polls 
and vote is difficult, if not quite impossible, to specify in advance of 
sampling and the trait itself is susceptible to change without notice. 

Cluster Sampling. The method of stratified sampling is also used 
where the unit of sampling is a group rather than the individual. This 
method, sometimes called cluster sampling , is especially important in the 
study of human populations when the individuals are often grouped (as 
by families, inhabitants of single houses or apartment houses or of 
blocks, and so on) as in the census, for instance, and it becomes very 
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difficult to sample individuals at random under such circumstances. 
Most uses of this method apply a system of u exclusive units,” where no 
individual or group is included in more than one sampling unit. Mahal- 
anobis (Ref. 19) has used a variant of the method called the 11 zonal 
configurational” type or the “overlapping system of grid sampling,” in 
which the same individual or group may form a part of more than one 
sampling unit. He points out that this method is analogous to sampling 
from an urn with replacement. 

Purposive Selection . A method of systematic sampling essentially 
different in principle is that which is called purposive selection . Instead 
of making a random selection of the sampling units within strata, this 
method selects such groups of units that have the weighted sample 
means of certain characteristics, the controls , in close agreement with the 
population values. This method might save time and labor at times. 
However, it has often proved to be very hazardous and inaccurate, 
probably because the sampling units are large and few in number, so 
that it is difficult to secure a representative sample. Furthermore, the 
method hypothesizes a considerable knowledge of the population in 
advance of the sampling process. This information is not often avail¬ 
able, and it has been found in a particular case that the facts about the 
population needed for controls served only for the particular year when 
the sampling survey was made (Ref. 23). 

Applications of the purposive method have been made in certain 
economic surveys by selecting so-called “ typical” counties. The prac¬ 
tice of selecting a particular school or groups of schools in which experi¬ 
ments are conducted may also be illustrations of this method, espe¬ 
cially if general conclusions are drawn for the educational factors under 
investigation. 

Double Sampling . A method of systematic sampling designed espe¬ 
cially for sampling human populations is the one called double sampling 
(Ref. 21). This method involves two sampling investigations. The first 
consists in drawing a large unrestricted sample from the population, 
determining for each individual the value of the character, the collection 
of information on which is easy and relatively inexpensive. This 
secondary character is known to be closely correlated with the primary 
character with which the investigation is concerned. The collection of 
data concerning the values of the primary character is expensive. The 
second investigation consists in drawing a small sample in which the 
values of both the primary and secondary characters are ascertained. In 
this method, discussed by Neyman (Ref. 21), the large sample is used to 
stratify the population into groups within which the secondary character 
is relatively homogeneous. Since the two characters are highly corre¬ 
lated, this procedure will also result in an effective means of stratification 
with respect to the principal character. It is possible, therefore, to 
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proceed with the drawing of the small sample out of the strata comprising 
the large sample. Accordingly, a more accurate estimate of the primary 
character may be expected to be obtained from the stratification based 
on the first investigation. The first sample must be large enough to 
provide an accurate estimate of the population numbers if increased 
accuracy of estimation is to result through the double sampling method. 

A variant of this method is to find the regression of the primary on the 
secondary character from the data in the small sample. The predicted 
value in the regression equation which corresponds to the mean value of 
the second factor in the large sample is then used to estimate the mean 
value of the primary character for the total population (Ref. 1). 

Subsampling . Cochran (Ref. 1) describes a method called subsamp¬ 
ling, in which a sampling unit may itself be enumerated by subsampling. 
There might be a hierarchy of sampling units in multistage sampling; for, 
example, sampling units might be selected in the first stage of random¬ 
ization, within each such selected unit. Smaller sampling units then 
might be selected by another act of randomization, and so forth. This 
special form of subsampling has been called “nested” sampling by 
Mahalanobis (Ref. 19). 

The Selection of the Sampling System 

No simple principle exists which leads the investigator uniquely to 
the selection of a system of sampling. From the many sampling designs 
that can be constructed in order to answer the questions which prompted 
the research, one will be selected for application on the basis of the nature 
of the problem, the resources and the materials available or obtainable, 
and certain statistical and administrative considerations. 

From a statistical standpoint, the problem is to secure the best esti¬ 
mate of the population characters chosen for study. On the basis of 
knowledge of limiting distribution theory and of best linear unbiased 
estimates, it is the usual practice to take the standard deviation of the 
sample estimate about the character estimated as the measure of sampling 
error. The relative efficiency of different methods of estimation is 
obtained from the ratios of the reciprocals of the variances of sample 
estimates of the mean. The statistical criterion of efficiency is usually not 
the only basis of deciding upon the sampling plan. Another principal con¬ 
sideration is the cost of the investigation. 

The basis of planning, therefore, is the selection of a sample design 
which combines precision of the results and expenditures in such a manner 
that either the cost is a minimum for any specified precision or the pre¬ 
cision is a maximum for any assigned cost. Considerable work has been 
done in recent years on the study of costs associated with the various 
sampling and estimating operations, including the determination of the 
relative magnitudes of variances and covariances between and within 
various kinds of sampling units. 
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Thus, although no complete theory with practical applicability is 
available whereby the investigator always could be certain of selecting 
the “best” sampling design and at the same time the “best” process of 
estimation and allocation of sampling units, considerable empirical and 
scientific knowledge is available upon which an intelligent selection can 
be made. To a certain extent each field of study may have its own 
peculiar sampling problems. But the principles so far educed have Avide 
and general application. Often an exploratory or pilot investigation 
may save a good deal of time and unnecessary expense by providing useful 
information of the cost and variance, or error functions. In addition, 
the exploratory period can be used advantageously in giving training to 
workers in both field and statistical work and thus in controlling mistakes 
and errors arising from the human factor. 

Statistical Aspects of Sampling Designs 
The statistical planning of the program for obtaining observations 
from samples involves the problems of specification and estimation. A 
knowledge of the mathematical form of the population is known or 
assumed to be known, but the values of one or more parameters entering 
into the form are unknown. Estimates of one or more parameters are 
desired, each with minimum sampling error. 

In most statistical investigations by sample, a central problem is to 
ascertain the value of an average (Ref. 5). 

Consider a population i r with a parameter of location \x and of dis¬ 
persion o’. A sample Xi, X 2j . . . , X n is drawn. A function of these 
X’s, say where 

/ = m(Xi, X,, • • • , Xn) (9.01) 

is said to be a “mathematical expectation estimate” of n, if the mean 
value of it! in repeated samples is equal to Further, the estimate of y! 
may be said to be the best linear estimate of m, if it is linear with respect 
to the X’s: 

\j! = ciXi + C 2 X 2 + * * * + c n X n + Co (9.02) 

and if its standard error is less than that of any other linear estimate of y'. 
The value of an average is 

X 


where n* is the number of sampling units in the fcth stratum; Xki, the 
value of the variate in the ith element of the kth stratum; 2(n k ) may be 
known or unknown, finite or infinite. The major sampling processes 
such as random sampling with or without replacement, stratified random 
sampling of individual elements, stratified random sampling of groups 
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or clusters, double sampling, and purposive sampling can be illustrated 
and differentiated by the different grouping methods for each of which the 


sum of y y ( Xki ) in Equation (9.03) is obtainable. 




Insight as to the arrangement of strata and the average to compute 
has grown out of the study of the problem of estimation in stratified 
sampling of groups. In stratified sampling, (9.03) becomes 


* _ 2(n*)(X*) 

S(njfc) 


(9.04) 


where Xk equals the average value of X in the fcth stratum. In some 
problems, it has been found, that, by choosing the strata so that the 
regression of Xk on some appropriately selected variate Y is linear, an 
improved estimate of X can be made (see Double Sampling, above). 

In general, there is no unique unbiased estimate of a parameter. 
Under particular conditions the best estimate can be found if the quantity 
is a linear function of the observations as in (9.02) above. A method and 
the conditions are given in a theorem by Markoff (Refs. 5 and 22). 

It is possible to make the obtained estimate the best linear estimate 
if another stipulation about the variation of the Xk s in strata correspond¬ 
ing to different fixed values of Y and rik is fulfilled. Neyman (Ref. 22), 
basing his method on Markoff's theorem, has indicated that the numbers 
in the sample should be proportional to the product of the number of 
sampling units in a stratum by the standard deviation of the measured 
character within the stratum. The “ best ” estimate is defined by the two 
conditions that (1) it should be a linear unbiased estimate with (2) 
minimum variance (see Equation 9.02). 

A fundamental condition in the best solution is that the total number 
of sampling units must be kept constant. In Neyman's method the best 
solution depends on a knowledge of the population standard deviation 
of each stratum. Sukhatme (Ref. 26) investigated the effect of estimat¬ 
ing the standard deviation of the different strata by a preliminary inquiry. 
He concluded that a gain in efficiency takes place even in the case where 
the population standard deviations, Ci s, are estimated from the sample 
standard deviations, the $i s, that is, when the a }s are different in differ¬ 
ent strata. 

Mahalonabis (Ref. 19) discusses in detail the statistical planning 
involved in large-scale sample surveys taking into account both the cost 
and variance functions for obtaining optimum solutions. A compre¬ 
hensive and critical review of recent statistical developments in sampling 
and sampling surveys has been made by Yates (Ref. 28). 


Types of Error in Investigation by Sample 
Statistical data are the raw material of judgments, comparisons, and 
truth. The highly condensed form to which the original data are usually 
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reduced by processes of statistical reduction gives to the final results a 
display of exactness that is not necessarily intrinsic. In viewing the 
final product, one should not forget the original material from which it 
came. In order to evaluate the findings from an investigation, much 
information is necessary as to the ways in which the original data were 
collected, the conditions surrounding them, and the kinds of errors to 
which they are susceptible. We wish to consider here the types of 
errors which are present in every study by sample. 

Random Sampling Errors . First, there are the random sampling 
errors or sampling fluctuations dealt with in the theory of probability 
and in the theory of sampling distributions. They are the outcome of the 
random sampling process, and sampling theory enables us to estimate 
them when we know their form of distribution. Random sampling 
errors have the advantageous property that they can be controlled by 
regulating the design and size of the sample. We have considerable 
theoretical and experimental knowledge of this type of error. Often, 
however, particularly in sampling survey studies, this is the smallest 
error in the collected data. 

Systematic Errors . Apart from sampling fluctuations, errors also 
originate from the unreliability of human observers, either in direct 
observation or in other forms of measurement. Errors of measurement 
are usually much greater in biological, psychological, economic, or social 
investigations than in the physical sciences. Insofar as observational 
errors originate unconsciously, they may more or less follow the normal 
distribution of errors so that positive and negative deviations would tend 
to cancel increasingly as the number of observations increase. It is a 
mistake, however, to rely upon these errors' canceling one another. They 
may often possess not only a random element but also a bias. A special 
study needs to be carried out either by repeating the observations or 
measurements by the same observer or by more than one observer or by 
some other type of control and to compare the results. In making some 
observations, we are at times prone to dismiss as unessential conditions 
about which we think we know more or less. At times there may be 
justification for this attitude. It is good practice, however, to test the 
possibility of some circumstance as a cause by arranging the observations 
with respect to the circumstance. If the assumed cause is real, it is 
found that the errors of the observations display a regularity not found 
in chance errors. Wrong assumptions concerning the operation of some 
circumstance may bring about similar findings in calculations dealing 
with the results of observations. Errors of this type are called systematic 
errors. 

Miscellaneous Inaccuracies . Contrasting sharply with random obser¬ 
vational errors and sampling errors, inaccuracies may arise in a number of 
ways. The worst of these originate from such practices as false entries 
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or entries by pure guess, deliberate violations of directions, or similar 
gross negligence. Other milder forms of inaccuracies, but nevertheless of 
substantial significance, need to be considered. Variations in kind and 
degree occur, dependent on problem, field, and method. 

In making a house-to-house survey, it is obvious that much depends 
on the resourcefulness, skill, and reliability of the investigators. The 
kind of information obtained by asking questions on subjects that are 
poorly defined, or on matters of opinion, depends considerably on the 
form of questions asked. Sometimes the result of the inquiry is condi¬ 
tioned by the investigation itself, as, for instance, when the person inter¬ 
viewed may not have heard or thought of the subject before. 

Much use of the questionnaire is made in collecting information from 
people who are not interested in statistics, and are often unwilling or 
unable to provide the information sought. People vary greatly in the 
trustworthiness of their returns, which are likely to be reasonably accurate 
only if the questions are few, well formulated, and easy to answer. 

In complicated and difficult investigations, trained and experienced 
workers are necessary if the information collected is to be relied upon. 
For instance, special training and very complete directions as to how the 
forms are to be filled in are given to census enumerators. 

Changed Conditions . Statistics extending over long periods are likely 
to be influenced by changes that may have taken place, particularly by 
new knowledge that may have altered the basis of classification or the 
ordering of things into classes. Improved systems of coverage and 
enumeration may render difficult comparisons of census data collected 
in different decades. Uniformity and precision in classifying can be 
achieved only if very complete and explicit definitions are given. Con¬ 
tinuity is usually very significant in recorded statistics. In fact, at times 
the statistician may prefer an existing practice, so as to ensure continuity 
of records, to improved procedures. At any rate, if changes need to be 
made, he will insist on the collection of two sets of data, at least for some 
time—one under the old plan, the other under the new, so that continuity 
may be preserved. 

This need for uniform conditions might be illustrated if an attempt 
were made to interpret the differences between the health status of men 
eligible for Army service in 1917 and in 1941. Such difficulties as the 
following would be likely to make any rigorous comparisons impossible: 
(1) the age groups are not identical, (2) the criteria for rejection are not 
the same, (3) changes in medical knowledge since 1917 have made pos¬ 
sible the development of greatly improved techniques for identifying 
physical disabilities. 

The valid interpretation of final statistical results requires a knowledge 
of the conditions surrounding the events recorded at the place and time 
of observation. For instance, there are many limitations on the use of 



Chap. IX] SAMPLING THEORY AND PRACTICE 


197 


physical-examination findings of selectees in World War II for drawing 
inferences concerning the general health status or the incidence of minor 
defects among the population (Ref. 14). The examinees at any induction 
station comprised a partly selected and widely variable sample of the 
male population at a specific time and place. The composition of the 
selectees chosen for examination was conditioned by (1) prevailing Selec¬ 
tive Service policies with respect to deferments for dependency, (2) 
practices of the Armed Forces in regard to the acceptance of special 
groups, (3) the extent of differential screening of local boards, and (4) 
the number of men previously rejected who were sent up for re-examina¬ 
tion. The comparison, for example, of those individuals who were 
rejected during the prewar period of Selective Service with those rejected 
at various periods during the war would require careful interpretation. 
The high rejection rates of the former do not necessarily imply a low 
level of national health. 

Differing Types of Canvass. Deming (Ref. 6) enumerated and dis¬ 
cussed 13 different factors that affect the usefulness of survey studies. 
This comprehensive and informative discussion includes additional 
types of errors or additional properties of errors not hitherto discussed. 
Only brief consideration can be given to these. Information is needed 
with respect to differences in results obtained from different kinds and 
degrees of canvass, such as mail, telephone, telegraphs, and interviews; 
also from different types of questionnaires. Different results are obtained 
by the different sponsoring agencies under whose auspices the survey 
study is carried out. For example, studies on income and work status 
yield different results when conducted by relief organizations than when 
conducted by a government agency. Because of this bias, government 
and private organizations have at times contracted with other agencies 
for the collection of data. Cohen (Ref. 2) reports an instance where in 
China one census, taken for poll-tax and military purposes, showed a 
population of 28,000,000. Another census over the same territory, taken 
this time for famine relief, returned a population of 105,000,000. 

Changes in Population . There may be changes in the population in 
the interval between the time of collection of data and their processing. 
A sample may be more reliable than complete returns because of the 
shorter period required for collecting and processing. Because process¬ 
ing the data must commence at a certain date, replies received after this 
deadline are not included. The late reports may be biased. A sample 
study of these belated reports may at times determine whether bias is 
present. The comparison of two or more samples of the same sampling 
design or of subsamples within the main sample does not detect “ system¬ 
atic error” inherent in the methods. If two samples agree it may indi¬ 
cate not that they are devoid of bias but that their biases are 
similar. 
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Unrepresentative Date . Bias can occur from an unrepresentative 
choice of a date for a survey or a period to be covered. For instance, a 
passenger-traffic survey would not be representative if taken on or near 
a holiday date, nor would a school survey taken, say, the first week in 
June. Comparison of retail sales made in April, 1938, with those in 
April, 1937, gave spurious results, since the Easter holiday in 1937 came 
at the end of March, whereas in 1938 Easter occurred in the middle of 
April (Ref. 2). 

In Processing . Processing errors may result from differences among 
workers in interpreting the wording of instructions, in editing, and in 
field work. Machine and tally errors need to be checked. 

Planning the Investigation 

It should be noted that even if a 100 per cent sample were taken 
there would still remain errors of certain kinds enumerated here, such as 
bias of nonresponse (omissions), errors of response, late reports, errors 
originating in the tabulation plans, bias from unrepresentative dates or 
periods, changes taking place in the population before tabulations become 
available, and errors in interpretation. Furthermore, even if there is 
100 per cent coverage, this is still a sample since at any other given time 
a new sample needs to be taken. 

In the planning of an investigation by sample the research worker 
attempts to make the best possible effort to control the errors to which 
his study is susceptible. The distribution of his effort should be deter¬ 
mined so that the greatest possible information will be obtained with the 
funds available. In fact, preliminary consideration of all the errors 
to which the projected study is liable largely determines whether or not 
the investigation should be carried out. Once a decision to proceed has 
been taken, the reduction in error will be dependent upon the wise 
distribution of funds such that the more significant sources of error will 
receive the most attention. Bias, consistency, and efficiency are depend¬ 
ent upon the system of sampling and estimation function used. The 
theoretical distinction between types of errors to be expected is clear. 
Sampling error and observational error of the random type are capable of 
statistical control. The amount of sampling error to be expected can be 
determined for each particular type of sampling design and size of sample. 
If the amount of error that can be tolerated is known, then it is an unwise 
use of resources to take a larger sample than is necessary. 

Inaccurate instruments, the fallibility of human observers, defective 
techniques, biased methods of selecting data, and other such sources of 
systematic variation give errors which do not come within the scope 
of the classical theory of errors. These types of errors, therefore, need to 
be cared for largely by knowledge of and control of their sources. System¬ 
atic errors in the data may be, larger than errors due to sampling. 
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Except for the random factors that might balance out, further increase 
in size of the sample does not increase the accuracy by eliminating 
systematic errors. Nor would they disappear if complete enumeration 
was resorted to. 

It is an essential part of the sampling design to provide statistical 
controls for detecting and guarding against systematic types of errors. 
One way of doing this, for instance, in a sample survey is to collect two 
or more interpenetrating subsamples, which may be independent or 
partially linked together (Ref. 19). Such a simple control may not 
always suffice. It may be advisable, therefore, to arrange for the survey 
of the same sample, wholly or in part by two or more different workers. 
Just which sources of error are to receive the most attention will depend 
upon their importance in relation to the accuracy with which the study 
must be carried out in order to produce useful results with the funds 
available. This is the matter of the particular problem. Knowledge 
of the actual conditions and the types of systematic variation likely to 
arise in them, and how they may be eliminated or reduced when necessary, 
is basic. 

The margin of error of the final estimate that can be tolerated if 
the conclusions drawn are to merit confidence must be considered in 
light of all kinds of errors to which the data are susceptible. The lack 
of accuracy and reliability in the data cannot, of course, be overcome by 
the subsequent statistical analysis that is applied. Thus, the task is first 
to secure data that are sufficiently precise for the purpose in hand and 
then to apply methods of analysis that make the best possible use of the 
information they contain. 

Procedures in Random Sampling 

In random and in representative sampling, a fundamental assumption 
is that the sample is random. Upon the fulfillment of this assumption 
rests the validity of the application of most of the statistical analysis. 
The objective measurement of errors of estimation and the determination 
of the significance of the sample results are dependent on the hypothesis 
of the randomness of the sampling errors. It is, therefore, of interest 
and importance to note what solution, if any, the statistician formulates 
so that he can proceed with confidence in his analysis. 

The information as to whether a sample is random is not available 
through examination of the properties of the sample itself. This short¬ 
coming is illustrated by some of the hands which are obtained from deal¬ 
ing at random from a pack of cards, for instance, a hand containing 
13 diamonds. The criterion, therefore, of a random sample has to be 
sought elsewhere, namely, in the process or method of selection. If a 
jandom method of selection can be developed, then a random sample can 
be simply defined as a sample which has been obtained by a random 
method. 
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The concept of a population comprised of aggregates of things or 
repeated events or phenomena is fundamental, since no collection of 
things can be thought of as random unless it in turn can be regarded as 
one of a set of such aggregates. Here it is assumed that a random set of 
objects means that the set was obtained by a random method. It is 
recalled that random sampling at the outset is designed to give every 
possible sample of given size an equal chance of being the actual sample. 
A requisite of a random sampling method is that it should be independent 
of the characteristics of the population under investigation. Since this 
definition of random selection refers to the specific character under study, 
it is evident that the random method in itself can not be thought of 
separately from the population the individuals of which are being chosen. 
A method might be random for one population and not for another. In 
fact, a method random for one characteristic of a population might not' 
be so for another. Kendall (Ref. 11) illustrates this point by citing the 
problem of sampling in a planned town by taking every tenth house. 
This procedure might give a random sample, but if every tenth house 
should be a corner house, the sample might or might not be random, 
depending on the character that was being studied. It would probably 
preserve its randomness if, for instance, the study was to determine the 
proportion of inhabitants with blue eyes, but probably would lose it if 
the investigation was concerned with estimating the distribution of 
incomes. 

No objective means is available for completely satisfying the require¬ 
ment of independence between method of sampling and characteristic. 
Such means would require complete information about the population, 
which, if available, would of course render investigation unnecessary. 
Confidence in the fulfillment of independence must rest more or less in 
the actual state of our knowledge at the time on an a priori basis. 

In practice, it is often essential to choose a sample random in relation 
to all properties of the population. This might appear to be an impos¬ 
sible task, since, as has been pointed out, it is in the very nature of the 
problem to sample the population according to at least one of its char¬ 
acteristics. This seeming predicament was removed by superimposing a 
new characteristic on the universe and sampling in accordance with it. 
The most useful characteristic that can be superimposed on an existent 
universe is that of ordinal number. If the universe can be enumerated, 
then the problem of random sampling becomes fundamentally that of 
discovering a series of random numbers. 

The customary way is to number the universe in any practical man¬ 
ner, whether or not related to its properties, and then to look for a set of 
numbers so that they constitute a random aggregate from the possible 
ordinal numbers of the universe. Thus, rather than the requirement of 
determining in each case whether a sampling method is independent of the 
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characteristics of the population, it became necessary only to construct 
a set of digits capable of giving a random sample of any size from any 
finite set of integers. Under such conditions it may be expected that the 
arrangement of digits in the sampling numbers will not be associated 
with the characteristics of the universe. Such was the principle upon 
which sets of “random sampling numbers” have been compiled. 

Kendall (Ref. 11) specifies certain requirements, other than that of 
having been chosen at random, that a set of random sampling numbers 
must satisfy if it can be used for random sampling. Each digit in a set 
of N random sampling numbers is expected to occur in N/10 cases and 
each pair of digits to occur an equal number of times. He speaks of a 
set with such properties as locally random and gives four necessary 
tests, although they are not sufficient, to determine the existence of local 
randomness: 

(1) The frequency test. Each digit should occur an approximately 
equal number of times. 

(2) The serial test. There should be no tendency for a digit to be 
followed by any other digit. 

(3) The poker test. There will be certain expectations to be satisfied 
for digits to be arranged in blocks of, say, five, four, three, and 
so on. 

(4) The gap test. There are certain expectations to be satisfied with 
respect to the gaps occurring between the same digits in the 
series. 

There are two sets of random sampling numbers in common use, 
Tippett’s (Ref. 27) and Fisher and Yates’s (Ref. 7). A third set has been 
published by Kendall and Smith (Ref. 12). Tippett compiled his set 
by drawing 41,600 digits at random from census reports and by combin¬ 
ing in 4’s to give 10,400 four-figured numbers. They have been sub¬ 
jected to a number of inquiries in which they have met the criteria of 
randomness used. Fisher and Yates’s set of random numbers was 
constructed from the fifteenth and nineteenth digits of A. J. Thompson’s 
20-figure logarithmic table. The authors present tests of its randomness. 

Each of the compilations is accompanied by a number of illustrations 
of its use. If, for instance, a random sample is wanted from a list or 
roster of names, the procedure would be as follows: First each sampling 
unit is numbered in any way, systematic or otherwise. The tables are 
then opened at random and starting at any point and proceeding in any 
direction, such as up or down the columns, along the rows, or by some 
other predetermined plan, a sufficient number of pairs of digits or other 
combinations are taken to make up the predetermined size of the sample. 
Whenever the same number occurs twice or more it is simply ignored. 
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All numbers which exceed the total number of sampling units are also 
ignored. 

Other methods of drawing random samples are used, such as using 
coins, dice, roulette wheels, or cards. Great care must be taken, how¬ 
ever, to avoid bias in using such mechanical means. The human being 
has been shown to be especially incompetent to make a random selection. 
The problem of selecting a random sample has been greatly simplified 
by the preparation of tables of random sampling numbers. When the 
rules of the game are scrupulously observed, their use likely gives the 
best guarantee now available of obtaining a random sample. 

A Comparative Experiment in Sampling Methods 

In order to illustrate some of the principles underlying sampling 
procedures that have been discussed in this chapter, an experiment was* 
carried out. Its findings are presented herewith. 

We have a finite population consisting of 24,395 high school graduates 
whose ages were given as of the nearest birthday at the time of graduation. 
They have been classified according to sex and location of high school, as 
given in Table 45. The means and standard deviations in years for the 
total population and for each of the four subclasses are also recorded in 
Table 45. 


TABLE 45 

Ages of 1933-1944 High-School Graduates in Public Schools of Minnesota 
Classified according to Sex and Size of Locality* 


Age 

State 
as a 
whole 

Outside 3 cities of first 
class 

3 cities of first class 

Boys 

! 

Girls 

Boys 

Girls 

15 

84 

26 

43 

6 

9 

16 

1,585 

457 

812 

115 

201 

17 

8,729 

2,486 

3,870 

930 

1,443 

18 

12,148 

3,269 

4,726 

1,667 

2,486 

19 

1,562 

352 

637 

239 

334 

20 

216 

56 

73 

46 

41 

21 

71 

22 

19 

22 

8 

Total 

24,395 

6,668 

10,180 

3,025 

4,522 

Mean 

17.59 

17.56 

17.53 

17.74 

17.68 

S.D. 

.7799 

.7763 

.7903 

.7848 

.7352 


♦State of Minnesota, Department of Education, Statistical Division, December, 
1944. 


We shall assume that we wish to estimate the age of high-school 
graduates by taking a sample of 1,000 from the total population of 
24,395. We shall use three different methods of selecting the sample by 
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assuming that each age group is (1) evenly distributed among the sub¬ 
classes, (2) stratified proportionately to the sizes of subclasses, and (3) 
stratified proportionately to the products of the sizes and standard 
deviations of the subclasses. 

First we shall describe the method of drawing the sample of 1,000 
graduates from this population as a whole. 

The first step was to assign a five-place number to each element of 
the population (see Table 46). 

TABLE 46 

Assignment of Random Sampling Numbers 
to the 24,395 Graduates of Table 45 
Age Numbers 

15 00,001-00,084 

16 00,085-01,669 

17 01,670-10,398 

18 10,399-22,546 

19 22,547-24,108 

20 24,109-24,324 

21 21,325-24,395 

The second step was to read Fisher and Yates’s Table of Random 
Sampling Numbers (Ref. 7), page by page, first horizontally and then 
vertically. Each time five consecutive figures were read; they consti¬ 
tuted a five-place number which was then referred to Table 46 to give 
the element an age score. Whenever a number larger than 24,395 was 
obtained, it was discarded. In this way, we formed a sample of 1,000 
as indicated in Table 47. 

TABLE 47 

A Sample of 1000 Drawn by the Method 
of Random Sampling Numbers 



State 

Age 

05 a 
whole 

15 

4 

16 

63 

17 

378 

18 

486 

19 

53 

20 

11 

21 

5 

Total. 

.1000 


The final step was to stratify this sample of 1,000 according to the 
three methods enumerated above. 

The first method, that of stratification with no restriction, was very 
simple. We simply split each age group into four subgroups as reported 
in Table 48. 

In using the second method, that of stratification proportionate 
to the total number in the population in each of the four subclasses, we 
needed first to compute the proportions of the four subclasses. Let us 
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TABLE 48 


Stratification of the Sample of 1000 Graduates with No Restrictions 


Age 

Outside 3 cities 
of first class 

3 cities 
of first class 

Boys 

Girls 

Boys 

Girls 

15 

1.00 

1.00 

1.00 

1.00 

16 

15.75 

15.75 

15.75 

15.75 

17 

94.50 

94.50 

94.50 

94.50 

18 

121.50 

121.50 

121.50 

121.50 

19 

13.25 

13.25 

13.25 

13.25 

20 

2.75 

2.75 

2.75 

2.75 

21 

1.25 

1.25 

1.25 

1.25 


denote by N i and N 2 the numbers of boys and girls respectively, outside 
the three cities; by N 8 and JV 4 , the numbers of boys and girls respectively, 
inside the three cities. Then we calculate: 

Ni:N 2 :N z :Na = 6,668:10,180:3,025:4,522 

6,668 10,180 3,025 4,522 
24,395 * 24,395 ’ 24,395 * 24,395 
= .2733:.4173:.1240:.1854 

Each age group was then split according to this ratio. The resultant 
stratification is reported in Table 49. 

TABLE 49 

Stratification of the Sample of 1000 Graduates according to Proportionate 
Numbers in the Population Strata 


Age 

Outside 3 cities 
of first class 

3 cities of first class 

Boys 

Girls 

Boys 

Girls 

15 

1.09 

1.67 

0.50 

0.74 

16 

17.22 

26.29 

7.81 

11.68 

17 

103.31 

157.74 

46.87 

70.08 

18 

132.82 

202.81 


90.10 

19 

14.48 

22.12 

6.57 

9.83 

20 

3.01 

4.59 

1.36 

2.04 

21 

1.37 

2.09 

0.62 

0.93 


In using the third method, that of stratification proportionate to the 
product of the numbers and standard deviations in the four subclasses, we 
also first have to compute the proportions of iW/s of the four subclasses. 
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Let us assume that N 1, N 2, N 3 , and N 4 have the same notation as 
used before. Denote by o-i and a 2 the standard deviations of the ages of 
boys and girls respectively, outside the three cities; by a 3 and a 4 , the 
respective standard deviations inside the three cities. Then we calculate: 

N i<r 1: N 202 : N 3P3 : N404 

= 6,668(.7763): 10,180(7903):3,025(.7848) :4,522(7352) 
= 5,176:8,045:2,374:3,325 
_ 5,176 . 8,045 . 2,374 . 3,325 
18,920 * 18,920 ** 18,920 * 18,920 
= .2736:.4252:. 1255:. 1757 

Each age group was then split according to this ratio. The resulting 
stratification is reported in Table 50. 


TABLE 50 

Stratification of the Sample of 1000 Graduates Proportionate to the Product 
of the Means and Standard Deviations in the Population Strata 


Age 

Outside 3 cities 
of first class 

3 cities of first class 


Boys 

Girls 

Boys 

Girls 

15 

1.09 

1.70 

0.50 

0.70 

16 

17.24 

26.79 

7.91 

11.07 

17 

103.42 

160.73 

47.44 

66.41 

18 

132.97 

206.65 

60.99 

85.39 

19 

14.50 

22.54 

6.65 

9.31 

20 

3.01 

4.68 

1.38 

1.93 

21 

1.37 

2.13 

0.63 

0.88 


We then tested the goodness of fit for the three kinds of stratification 
by using the x 2 -criterion. Before doing this we needed to compute the 
theoretical expectations of frequencies for each age group in each subclass 
if we drew a sample of 1000 exactly representative of the parent population. 
The calculations of the theoretical expectations are reported in Table 51. 

The test of the goodness of fit of the method of randomization without 
restrictions gave a value of x 2 = 262.2836. Referring to the x 2 table 
with 18 degrees of freedom, we find that P < .001. Therefore, we con¬ 
clude that this kind of stratification is not a good fit to the theoretical 
expectations. 

The test of goodness of fit of the distribution of observed values from 
the method of stratification according to proportionate numbers and the 
theoretical distribution gave a value of x 2 = 20.1521. Referring to the 
X 2 table with 18 degrees of freedom, we find that the corresponding 
value of P > 30. Therefore, we conclude that the stratification pro- 
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TABLE 51 

Calculation of the Theoretical Expectations of Frequencies for Each Age 
Group of Each Subclass for a Representative Sample of 1000 Graduates 



Age 

F 

(Frequency of 
population) 

Per Cent 

F 

24,395 

/*. 

(Theoretical fre¬ 
quency: % 1000) 

. 

15 

26 

.00107 

1.07 

[ 

16 

457 

.01873 

18.73 

\ 

17 

2,486 

.10191 

101.91 

/Boys < 

18 

3,269 

.13400 

134.00 

I ) 

19 

352 

.01443 

14.43 

1 1 

20 

56 

.00230 

2.30 

Outside ] \ 

21 

22 

.00090 

0.90 

Three < 





Cities ] / 

15 

43 

.00176 

1.76 

I [ 

16 

812 

.03329 

33.29 

1 \ 

17 

3,870 

.15864 

158.64 

\ Girls < 

18 

4,726 

.19373 

193.73 

J 

19 

637 

.02611 

26.11 

I 

■i 

73 

.00299 

2.99 

\ 

21 

19 

.00078 

0.78 

/ 

15 

6 

.00025 

0.25 

[ 

16 

115 

.00471 

4.71 

\ 

17 

930 

.03812 

38.12 

/ Boys < 

18 

1,667 

.06833 

68.33 

/ ) 

19 

239 

.00980 

9.80 

1 i 


46 

.00188 

1.88 

Three ) 


22 


0.90 

Cities \ 


9 


0.37 

j | 

16 

201 


8.24 

I \ 

17 

1,443 


59.15 

\ Girls l 

18 

2,486 

mmmn T!iM 

101.91 

J 

19 

334 

.01369 

13.69 

1 


41 


1.68 

\ 

21 

8 


0.33 

Total 


24,395 

1.00000 

1000.00 


portionate to subclass numbers in this case is a good fit to the theoretical 
expectations. 

From the test of the goodness of fit for the method of stratification 
according to the product of the numbers and standard deviations in the 
sub classes, we found a xo = 18.1743. Referring to the x 2 table with 
18 degrees of freedom, we find that the corresponding value of .50 > P 
> .30. Therefore, we conclude that the stratification proportionate to 
the products of subclass numbers and standard deviations in this case 
is a good fit to the theoretical expectations. It is noted from Table 45 
that the subclass standard deviations are all in the same magnitudes. 
Hence, the ratio of N\<j\\N xji,\Nz<jz\N differs very little from the 
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ratio dNi'.NiiNtiNi. We do, however, note a reduction in x 2 in taking 

into account the subclass standard deviations. 

Problems 

1. Work out a sampling design for securing data about the number of 
students enrolled in the several high-school subjects in your state. 

2. Design a sampling survey for obtaining data concerning promotion 
policies for teachers in the elementary schools of your state. 

3. Secure a representative sample of schools to engage in a cooperative 
experiment testing the relative efficacy of different curricular prac¬ 
tices in secondary schools. 

4. Design a sample survey for securing the best estimate of student 
enrollment in institutions of higher education in the United States; 
this information to be made available within a month after the 
opening of the institutions in the fall. 

6 . Set up a plan for a survey by sample of the attitude of the public 
toward Federal support of education to equalize educational oppor¬ 
tunities. 

6. Set up a sample of schools in your state which can be used recurrently 
for the collection of school statistics. Design the sample so that 
designated portions of the schools are taken out each year and new 
schools added so that no school carries an excessive burden. 

7. Compare a method of sampling with the method of complete survey 
for a specified educational problem with respect to cost and time 
required to issue the results. 

8 . What recent developments have taken place in the techniques of 
questionnaire construction, in procedures in carrying out the inter¬ 
view, and in bringing about maximum returns from prospective 
respondents? 

9. What methods have been developed to control error in the processing 
of survey data? 

10. How can developments taking place in electrical and electronic 
equipment be applied to large sample surveys? 

11. Suggest methods based on statistical and research principles which 
could be used for improving and standardizing procedures for col¬ 
lecting school statistics in your state. 

12. Evaluate the sampling procedures used in Kinsey, Alfred C., Pom¬ 
eroy, Wardell B., and Martin, Clyde E., Sexual Behavior in the 
Human Male. Philadelphia: W. B. Saunders Company, 1948. 

13. Criticize the sampling methods used in the Revision of the Stanford- 
Binet Scale. See Marks, Eli S., “Sampling in the Revision of the 
Stanford-Binet Scale,” Psychological Bulletin , Vol. 44 (1947), pp. 
413-434. 
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14. Compare the relative efficiency of the three different sampling 
methods described in the text for estimating the ages of high-school 
graduates by calculating the mean and standard deviation for each 
method and comparing these estimates with the population values. 
Calculate the sampling errors for each method (See Note in Problem 
15). 

15. Specify methods of forming estimates and calculating sampling 
errors for each of the following sampling methods: 

(a) Random sampling (no restrictions) 

(b) Stratified sampling 

(c) Cluster sampling 

(d) Sub-sampling 

(e) Stratification for two or more factors 

(f) Balancing 

Note: This problem should be postponed until the student has 
studied the techniques of analysis of variance and covariance. 
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CHAPTER X 

ANALYSIS OF VARIANCE AND COVARIANCE 


The analysis-of-variance technique developed by R. A. Fisher and 
first reported in 1923 (Ref. 7) constitutes a met hod capable of analyzing the 
variation to which experimental and observational material is subject so 
that an assessment of the various components of variation can be made. 
Since its introduction, the analysis of variance has become more and 
more useful to large numbers of research workers in many fields. Fish¬ 
er’s technique is the only efficient one so far developed by which it is 
possible to differentiate the variation according to causes or groups of 
causes and to interpret the significance of a number of components 
simultaneously. 

The modern advances in experimental and sampling designs have 
become possible through the development of exact tests of significance 
and of the analysis of variance. Without these tools, the assessment 
of the components of variation traceable to the sources specified by the 
experimental or sampling design would be a very involved and difficult 
enterprise. Fisher (Ref. 4) describes the analysis of variance as used 
in the analysis of experimental results as a simple arithmetical procedure 
for arranging and presenting the experimental results in a single com¬ 
pact table. This form of presentation shows both the structure of the 
experiment illustrated by the division of the number of degrees of free¬ 
dom, and the relevant results arranged conveniently for the application 
of the necessary tests of significance. 

The Analysis of Variation. Assume that we have a measure of a 
characteristic whose value is specified by the letter X. This value of X 
usually varies from one individual to another or for repeated measure¬ 
ments of the same individual. In general, the variation is due to a large 
number of different factors or causes. Of these factors some may be 
capable of identification and therefore may be called assignable causes 
of variation. However, there are usually numerous other causes which 
cannot be segregated because of our ignorance concerning them. These 
are spoken of as chance causes . As we gain in knowledge, more and 
more factors become assignable until the phenomenon can be completely 
explained if we can identify all the factors giving rise to the variation, 
t The contribution of the known and unknown factors to the quantity X 
may be regarded, at least to a first approximation, as additive in character 
and may be represented symbolically thus: \ 

X = a + b + c+ • • • + z 
210 


( 10 . 01 ) 
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where a, 6, c, . . . denote the respective contributions of the known 
factors A, B f C, . . . , and z represents the residual or the portion of X 
attributable to chance or unknown factors. If, for instance, the factors 
A y By C, . . . can be maintained under complete control, their respective 
contributions a, 6 , c, . . . will continue to be constant, whereas the 
fluctuations from unit to unit in X will be entirely attributable to the 
variation in z. 

In experimental work various hypotheses may be advanced with 
respect to the effect of one or more factors, namely, A, JB, C y . . . , and 
experimental designs are prepared to make the best determination of the 
presumed effects, a, 6, c, . . . . The measures obtained of the presumed 
effects need to be tested with a view to determining their significance. 
If the measured effects are real, that is, traceable to the origin specified 
by the particular experimental design, the experimental results would be 
characterized as heterogeneous in variation. If, however, the variation 
presumably contributed by the several independent contributions of the 
factors A, By Cy . . . would be only of the order of magnitude of the 
effect assigned to the random sources of variation, the conclusion would 
be that the presumed effects were not real but attributable to random 
causes. The variation in the experimental material would then be 
spoken of as homogeneous . That is, in order for variation to be strictly 
homogeneous, it is purely random—caused by a multiplicity of minor 
independent factors, incapable of resolution into more elemental form 
and indistinguishable one from another. 

Hence, the fundamental problem in studies of variation is to be able 
to differentiate the variation and to trace each contributing factor or 
group of factors to its source. Although an analysis of this kind is of 
special significance in experimental work, there are many situations in 
research work where differentiation of sources of variation in observa¬ 
tional data is an essential part of the analysis. A general problem is 
that of determining whether two or more samples may be regarded as 
random samples from the same homogeneous population. 

An Application of Analysis of Variation . We shall illustrate the main 
ideas of the above discussion by presenting an example. Let us take the 
data recorded in Table 52 which represent the mental ages in months of 
6 samples of 6 pupils each, each randomly chosen from the same grade in 
6 different urban schools. Suppose that the data are required to answer 
the question: Is there evidence that the mental ages of the pupils are the 
same for the same grade in the 6 schools? 

The variability in the mental ages of the pupils from the same school 
is so considerable that it would be hard to reach a conclusion on the point 
at issue from a mere inspection of the data in the table. Diagram 5 
brings out the situation more clearly; but even after examining it can we 
say that the differences in the means are significant? It is at this point 
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TABLE 52 

Mental Ages of 6 Random Samples of 6 Pupils Each from 6 Different Schools 



Mental ages in schools 

Individual 

1 

2 

3 

4 

5 

6 

1 

158 

156 

160 

159 

164 

153 

2 

157 

155 

158 

155 

163 

148 

3 

153 

154 

156 

148 

162 

145 

4 

151 

153 

155 

147 

160 

144 

5 

144 

151 

150 

146 

154 

144 

6 

143 

149 

145 

145 

151 

136 

Mean 

151 

153 

154 

150 

159 

145 


Grand mean = 152 


that statistical theory can give assistance by determining how much 
consideration should be given to the apparent differences in means, which 
* are hard to discern because of the residual fluctuation, z , due to chance 
causes. Specifically, the question is: What is the probability that the 
observed differences in the mean values of the 6 schools might have 
arisen simply through random sampling errors? 

M.A. 



12 3 4 5 6 

Sample Number 


Figure 5. The components of variation in the mental ages of 36 pupils. 

Statistical means enable us to make the calculation of the probability 
value. Since the form of the statistical test to be described later depends 
to a considerable extent on the nature of the random variation represented 
by z , it may be well to point out here the following assumptions: the 
random elements, in successive observations, are independent of each 
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other and of the values of the assignable factors, if these exist; and they 
are normally distributed about zero with equal standard deviation. 

We shall briefly describe the model the statistician sets up for the 
description of the situation discussed here. If the X of Equation (10.01) 
represents the mental age of a single individual, then a on the right-hand 
side may be considered as a general average or mean of the individuals 
in all the samples, and 6 as a contribution—positive or negative—associ¬ 
ated with a particular sample. If there are changes in mental age from 
sample to sample which affect X, then the values of 6, namely, b h 6 2 , 
. . . , for the 6 samples, will differ; if there are no such changes, then 

bt = 6 2 = • • • = 6 6 = 0 (10.02) 

The random or residual variations, z, among the mental ages of indi¬ 
viduals from the same school obscure the real situation about the true 
value as estimated from the sample. Hence, it is not possible to take the 
difference between the observed sample mean and the grand mean as 
equal to 6< (t = 1, 2, • • • , 6). Therefore, it becomes necessary to 
answer the question: Taking into account the observed variation among 
mental ages of individuals in the same school sample, what is the prob¬ 
ability that the 6 obtained sample means would differ so much among 
themselves because of random sampling fluctuations if, in fact, Equation 
(10.02) were true? 

The method used by the statistician to solve this problem is outlined 
below. 

Let X ti be the mental age score of the ith individual in the tth sample; 
i = 1, 2, • • • , 6; also t = 1, 2, • • • , 6. X t is the mean of the observa¬ 
tions in the £th sample and X is the grand mean of the 36 observations. 
As illustrated for one individual from the third sample in Diagram 5, the 
mental age score of the ith individual in the tth sample may be considered 
as the sum of three components. Thus: 

x ti = X + C Xt -X) + (X ti - X t ) (10.03) 

For example, the mental-age score (164) of the first individual in the 
sample from the fifth school is equal to 152 + (159 — 152) + (164 — 159). 
Referring to Equation (10.01), X may be considered as an estimate of a; 
(X t — X) as an estimate of b t ; and (X ti — X t ) of the residual variation z ti . 
These are estimates because we have observations only from a random 
sample from each of the schools. 

The significance of the difference X t — X (t = 1, 2, • • • , 6) or the 
acceptance of the hypothesis represented by Equation (10.02) is based 
on the magnitude of the components X t — X compared with X ti — X t . 
A precise statistical test of the significance involves the use of the follow¬ 
ing identity: 
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2 2 (*« - *>* - 2 X [(X,i ~ x<) + (X< - X)]a 

( i t i 

= 2 X (x,< _ x<)2+ X X (X< ” X)J 

t % t i 

+2 XX( x * ” X,)(X ‘ “ X) 

t i 

= £ £ (jf, - Xy + £ £ (z« - Xy 

t i t i 

since the product term will vanish because ^ (X ti — X t ) = 0. 


(10.04) 


Before the magnitude of the two components can be compared, they 
must be divided by the quantities known as the number of degrees of 
freedom , which are r and N — q } respectively, where r is the number of 
relations used to define the hypothesis, that is, 5 in this problem. There 
are 6 independent values; therefore, q = 6. If the hypothesis tested is 
true, then 5 relations hold among the 6 parameters, namely, b\ = 0, 
b 2 = 0, fc 3 = 0, 6 4 = 0, b b = 0. Thus, r = 5; N — q = 36 — 6 = 30. 

The criterion is 


F = 



(10.05) 


2 (x t - xy 

ti 



ti 


N - q 


(10.06) 


Using the Tables of F or z, respectively, we obtain the 5 per cent and 1 per 
cent levels of significance against which the obtained value of F or z is 
checked. 

The numerical solution for the example is carried out as follows: 
First, it is convenient to reduce the values in Table 52 by subtracting 
150 from each value obtaining the following: 


Individual 

1 

2 

3 

4 

5 

6 

Total 

1 

8 

6 

10 

9 

14 

3 


2 

7 

5 

8 

5 

13 

- 2 


3 

3 

4 

6 

-2 

12 

- 6 


4 

1 

3 

5 

-S 

10 

- 6 


5 

-6 

1 

0 

-4 

4 

- 6 


6 

—7 

-I 

-5 

-3 

1 

-14 


Total 

6 

18 

24 

0 v 

54 

-SO 

72 
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We then calculate the “within schools” sum of squares, that is, the 
sum of the squares of the deviations of the mental ages of the individuals 
in a school (sample) about their school means, as follows: 

^ ^ (X u - X t y = (8 s + 7 2 + 3 2 + • • • + [ —14] 2 ) 

i i 

_ ^ 6 2 + 18 2 + 24 2 + 0 2 + 54 2 + (—30)^ 

= 1638 - 792 
= 846 

We next calculate the “between schools” sum of squares, that is, the 
sum of squares of deviations of the school means about the grand mean, 
as follows: 

G 2 + 18 2 + 24 2 + 0 2 + 54 2 + (-30) 2 (72) 2 

6 36 

= 792 - 144 
= 648 

The total sum of squares, that is, the sum of squares of the deviations 
of the 36 individual mental ages from their grand mean, is obtained as 
follows: 

^ ^ (Xu - X) = (8 2 + 7 2 + 3 2 + • • • [ —14] 2 ) - 

t i 

= 1638 - 144 
= 1494 

The respective sums of squares with the appropriate number of degrees 
of freedom are recorded in the customary analysis-of-variance table, 


TABLE 53 

Analysis of Variance of the Mental Ages of the 36 Pupils in 6 Different 

Schools 


Source of variation 

d.f. 

Sums of 
squares 

Mean 

square 

F 

Hypothesis 

Between schools 

Within schools 

5 

30 

648 

846 

129.6 

28.2 

4.6 

Rejected 

Total 

35 

1494 





22(X,-X)- 

t i 


Table 53. The values under the column heading “mean square” are 
obtained by dividing the sum of squares in each row by the corresponding 
number of degrees. By applying Formula (10.05), we obtain as the 
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observed value of the criterion, F 0 : 


F o 


129.6 
28.2 
= 4.6 


We then enter the F-table with n\ = 6, ri 2 = 30, and find that 
F.oi = 3.7. Since our obtained value, 4.6, is greater than 3.7, we may 
conclude that there is a significant difference between the mean mental 
ages of the schools. We may also say that the null hypothesis under 
test, that is, the hypothesis stated in Equation (10.02), is rejected. 

Process of the Analysis of Variance . As has been observed in the 
example above, the actual process in the analysis of variance consists in 
breaking up the total sum of squares of deviations of the observations 
from the grand mean into independent portions assigned to certain ■ 
factors. The structure of these component parts, usually determined 
by the design of experiment, is specified by the number of degrees of 
freedom or by the number of independent comparisons, which, like the 
corresponding sums of squares, are additive in character. Therefore, 
the method is equally valid for small and large samples. 

Analysis of Covariance. Another useful extension of the general 
analysis-of-variance method is the analysis of covariance, also developed 
by Fisher. In this analysis, the process consists in breaking down the 
sum of products of deviations of any two variates from their means and 
assigning the respective components to specified sources. One of the 
most useful applications of the covariance method is in sorting out the 
covariance effects, particularly in experimentation. This operation 
makes it possible to increase the precision of an experiment by the 
elimination of causes of variation in some cases not controlled or con¬ 
trollable by the experimental design. 

Superiority of Analysis of Variance to the Traditional Biometric 
Method. While in experimentation the special value of the analysis of 
variance is manifest, it has many other applications in dealing with 
observational material. The efficiency of its use in testing if a group of 
samples may be regarded as having come from the same homogeneous 
population is clearly illustrated by comparison with the traditional 
biometric method used for such purposes. In the latter it is customary 
to calculate independently a standard error for each of the possible 
comparisons of the means of several samples. The labor involved in 
this procedure is not its only objection. The chief objection is that in 
many cases the obtained estimates of standard errors may not differ 
beyond merely sampling errors. In such cases it may be concluded 
that the larger part of the observed differences is attributable to random 
sampling errors, and that a more accurate as well as much less compli¬ 
cated analysis would result by pooling the sums of squares of deviations 
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from the different means and by applying the combined estimate in the 
test of significance. This change introduced by the analysis-of-variance 
method serves to provide an exact test of the null hypothesis and hence is 
used habitually by the modern research worker. ^Thus the method makes 
use of the relevant information contained in the data, since it takes into 
account the sampling distribution of statistics of the same kind, f 

The foregoing discussion serves to give a general account of the main 
ideas underlying the analysis of variation. Accompanied by the illustra¬ 
tive example, this discussion should be suitable as an introduction for the 
reader to the application of the analysis of variance to the simpler prob¬ 
lems. Probably, however, the research worker will profit from a more 
complete and rigorous study of the statistical principles underlying such 
a powerful tool as the analysis of variance and covariance. It is fre¬ 
quently observed that the formulation of a problem in statistical terms, 
which requires an orderly arrangement of the known results and an aware¬ 
ness of the assumptions and how they may be tested, assists in making 
clear the essential features of a problem hitherto not clearly visualized. 

Before a number of practical applications of the method of analysis 
of variance and covariance are demonstrated, the next section will 
present the systematic formulation and solution of the problems under¬ 
lying these methods. This section may be omitted by the reader not 
interested in the mathematical developments. He can proceed directly 
to the practical problems in Chapter XI. 

Mathematical Foundations of Analysis of Variance and 

Covariance 

Mathematical Ratification. 1. Suppose we have a normal distribu¬ 
tion with mean /x and standard deviation <r. It is well known that if 
we pick independently all the possible samples of size n from this popula¬ 
tion and denote the random effects for each sample by 

(1.01) z t = Y t - n(t = 1 , • • • , n) 

then the mean value of z t will be normally distributed with mean 0 and 
standard deviation <r/\/n. So we may define, in this case, the maximum 
likelihood estimate of the variance, a 2 , of the population as 

(1.02) cr 2 = 

where <r\ is the variance of sampling means of the random effects. 

The analysis-of-variance method consists in the breaking up of the 
total variance into independent parts which can produce independently 
the maximum-likelihood estimates of <r 2 due to the random effects alone. 
For instance, if we have p groups which are chosen by a certain criterion, 
then we immediately know in advance that these groups are more or less 
heterogeneous with respect to their corresponding means. However, we 
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pretend to assume that they are randomly chosen from the whole popula¬ 
tion in presenting the mathematical formulation as follows: 

(1.03) z at = Y H - n - a, (s = 1, • • • , p; t = 1, • • • , n) 

where z H is the random effect; Y 8t is an observation of the tth individual 
in the sth group; /z is the population mean; and a 9 is the deviation from 
the population mean for the sth group. By the maximum-likelihood 
method we can easily get two independent estimates of cr 2 from our 
sample: 

< 104 > S S (K " “ ?l) ' 

8 t 

(1.05) a\ ?.) 2 

where Y, = -; Y. = - 

n pn 

By using Fisher’s 2-test or the variance ratio F, we can immediately 
determine whether or not these two variances are of the same magnitude. 

Ordinarily, we are interested only in knowing if these groups have the 
same means. So we often make the test on the basis of <rf, which is called 
the variance of “within ” However, the result of significance of the 
variance <rf, which is called the variance of “between” implies three 
alternative explanations. These groups have 

(1) Different means and different variabilities. 

(2) The same mean and different variabilities. 

(3) Different means and the same variability. 

Therefore, if we wish to rule out the first two explanations, we have to 
test the hypothesis <r a = a for these groups. This may be done by using 
the Li-criterion. 1 

The same mathematical approach can be applied to the problems of 
more than one classification. In this case, we have independent estimates 
of <r 2 due to the interactions in addition to those due to the main factors. 2 

From the above, we wish to present assumptions which should be 
fulfilled in the analysis of variance: 3 

(1) The population distribution should be normal. This assumption, 
however, is not especially important. Eden and Yates (Ref. 2) showed 

1 For the method of using the Li-criterion, see page 82. 

* For a detailed consideration of these interactions, see Refs. 13 and 14 of Chapter 
XIII. 

* For assumptions underlying the analysis of variance, see Ref. 3; for a discussion 
q| the consequences when any assumption is not satisfied, see Ref. 1. 
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that even with a population departing considerably from normality, the 
effectiveness of the ^-distribution still held. The normality and inde¬ 
pendence of the random elements in successive observations has been 
pointed out on page 212. 

(2) All groups of a certain criterion or of the combination of more 
than one criterion should be randomly chosen from the subpopulation 
having the same criterion or having the same combination of more than 
one criterion. For instance, if we wish to select two groups in a school 
population, one of the third grade and the other of the fourth grade, we 
must choose randomly from the respective subpopulations. This 
assumption is the keystone of the analysis-of-variance technique. Fail¬ 
ure to fulfill this assumption gives biased results. 

(3) The subgroups under investigation should have the same variabil¬ 
ity. We should test this assumption before we run the analysis of vari¬ 
ance. Otherwise, a false interpretation of the results may follow. 

Maximum-Likelihood Solution of Analysis-of-Variance Problems. 
With One Classification. 2. Before we develop a general solution of the 
problems with any number of classifications, we start with the derivation 
of the solution for the problems with only one classification. The fre¬ 
quencies in different subclasses will always be assumed to be equal. We 
denote by Y 8t the score obtained by the tth individual in the sth subclass. 
The basic assumption in the analysis of variance is that we may write 

(2.01) Y 8l = M +A, + Zst 

where s = 1, • • • , p; t = 1, • • • , n; p denotes the number of sub¬ 
classes; n denotes the number of individuals in each subclass; M is defined 
as the general mean; A 8 is the deviation due to the sth subclass; and z $t 
represents the random effect for the 2th individual in the sth subclass. 
To minimize the variance of z 8t by using the maximum-likelihood method, 
we first write 


( 2 . 02 ) 

where 

(2.03) 


X* = H (Yu — M — A.) 2 + X Y A. 

at a 



which is a restriction imposed on (2.01); and X is an undetermined multi¬ 
plier of Lagrange. Differentiating x 2 partially with respect to M and A„ 
setting the resulting equations equal to zero, and solving, we obtain 


(2.04) 



(N = pn) 


(2.05) 
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From (2.03) and (2.05), we obtain 


(2.06) 

which reduces to 




(2.07) X = 0 

By the method of elimination, we have 

(2.08) = (Y.« - Y,) 2 = ^ n ~ n ^ Y* 

8 t a t 

The hypothesis we wish to test is 
(2.09) H 0 :A a = 0 

that is, the hypothesis that there is no significant difference between 
these subclasses. Assuming that H 0 is true, we have from (2.02), 

(2.10) x 2 = 22 (r,< ■ My 

8 t 

Minimizing x 2 with respect to M , we obtain 

(2.11) M = Y. 


Substituting this value into Equation (2.10), we obtain the relative 
minimum value x? 0 - 


(2.12) ( Y. t - Y.y = 2 2 “ N?i - = xJ + » X f i1 

a f 8 t a 

- NY* 


= x£ + Xo 

The additive property of the sum of squares is readily demonstrated in 

(2.12). All the results obtained may be summarized as in Table 54: 


TABLE 54 

Analysis of Variance for a Single Classification 


Source of variation 

D.F. 

Sum of squares 

Within subclasses 

N — p 

X. 8 

Between subclasses 

V - 1 

Xo 2 

Total 

N - 1 

1X3 

1X3 

to 

1 

^1 

M 



8 t 


With Two Classifications . 3. Now we shall work out the equations 
with two classifications—say column and row. We denote by Y 8l9ii the 
score obtained by the tt h individual in the sith column and the $2th row. 
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The basic assumption in the analysis of variance is that we may write 
(3.01) Y8i»it — M + A 8l + B 82 + 

subject to the following restrictions: 

(3.02) £ A *‘ = 0 

<1 

(3.03) £ B " = 0 

(3.04) 


11 A... - 0 
«1 82 


where Si = 1, • • • , pi; $ 2 = 1, • • • , Vi) t = 1, • • • , n; pi denotes 
the number of columns; p 2 denotes the number of rows; n denotes the 
number of individuals in each subclass; M is defined as the general mean; 
A 8l is the deviation due to the Sith column; B 82 is the deviation due to the 
s 2 th row; I 9l9i represents the influence of the interaction between column 
and row; and z 9l9it represents the random effects. To obtain the solution, 
we first write 


(3.05) x 2 — ^ X ^ (YM A t , B 8l 

+ a\ ^ A 8i + or 2 ^ B 82 + o3 ii'- 


91 03 t 


where an, a? 2 , and c* 3 are the undetermined multipliers of Lagrange. 
Minimizing x 2 with respect to M, A 8l , B 8 „ and we obtain 


(3.06) 

(3.07) 


M = I SI Y„., t = Y..{N = piptri) 

M 81 82 t 


= iyjy , 

— M — 

Z/ * *1«2 

82 

__ 01 

» t 

Vi 

2 np 2 

\ 

1 

II 

w Isxbi 

82 

Ol 


V 2 

2 np 2 



(3.08) 




(3.09) 


-—SI 

v 

- M - 

Lj ^8182 
81 

“ Pin ^ T 

J- «1«2< 

Pi 



L( ^«l«2 

oil 

= - Y. 

, . — 

Pi 

' — 1 > y _ 

*1*J „ Ld 1 «1«2* 

M-A., 

-B., 

74 , 






-A.,- 

B. a - 


02 

2npi 


as 
2 n 


2n 



(3.10) 


V V. . 2,4 1 4^2 


2 npi 


= 0 
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From (3.02) and (3.07), we have 

II 

41 

P2 

which immediately reduces to 

(3.11) ai = 0 

Similarly, 

(3.12) OL2 = ocz = 0 
By the method of elimination, we get 

(3.13) xS = 1 11 (Y„.„ - ?.,„)• = 11 1 - »11 ?!,, 

81 83 t SI 83 t 8\ 83 

The hypothesis we wish to test first is 

(3.14) //i:= 0 

that is, the hypothesis that there is no influence of the interaction 
between column and row. Assuming that Iii is true, we have, from 
(3.05), 

(3.15) (Y>m — M — Ah — B,,) 2 + 0i ^ A h + 02 ^ B„ 

.1 .2 t .1 82 

where 0i and 0 2 are the undetermined multipliers of Lagrange. Minimiz¬ 
ing x 2 with respect to M, A,„ and JS,„ we obtain 


(3.16) 

M = Y.. 


(3.17) 

Ah = - Y.. - 

01 

2p 2 n 

(3.18) 

B tl = y.„ - - 

0* 

2pi» 

where 



(3.19) 

01 = 02 = 0 



Substituting these values in (3.15) and simplifying, we obtain the rela¬ 
tive minimum value x*, : 

( 3 . 20 ) X ?, - 2 X X ~ f “ + f *-) 2 

81 83 t 

- * 2 + 222 (? “* 8 _ ~ f -*’ + ? - )2 

.1 .3 t 

- x2 + » £ J ” Ptn I ^ 

81 83 81 83 

= xj + Xi 
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Then we may test the relative hypothesis on the basis of x^* 

(3.21) H 0 i:A 9i = 0 

that is, the hypothesis that there is no significant difference between 
columns. Assuming that Ho i is true, we may write 

(3.22) X 2 = J ^ ~ M ~ B ** )2 + y X B “ 

8i 82 t «J 

where y is the undetermined multiplier of Lagrange. Minimizing x 2 
with respect to M and B a „ we have 

(3.23) M = 

(3.24) B. = £- n 

where 

(3.25) 7=0 

Substituting these values into (3.22) and simplifying, we obtain 

(3.26) xl = X U (Y '^‘ ~ ? ■ « )2 = £ + + V* n X ■ ~ N? - 

81 82 t 81 

= xl + Xl + Xoi 

Finally, we may test the relative hypothesis on the basis of x? oi : 

(3.27) Ho2:B 82 = 0 

that is, the hypothesis that there is no significant difference between 
rows. Assuming that H 0 2 is true and proceeding as before, we obtain 

(3.28) xl = XXX (F *““ - y --) 2 = x2 + xl + xl + pm X ?*., 

81 82 t 82 

- AY?. 

= xl + X? + Xoi + X02 

From (3.28), the additive property of the sum of squares is again clearly 
demonstrated. It is also noted, in the case of equal frequencies in sub- 

TABLE 55 

Analysis of Variance for the Problems of Double Classification 
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classes, that there is only one answer for each hypothesis tested, no 
matter what the order of testing may be. All the results obtained 
may be summarized as in Table 55. 

4. In general, if we have a problem of k classifications, the mathe¬ 
matical expression of the score made by the *th individual in the «ith 
group of classification A , the s 2 th group of classification B, . . . , and 
the $*th group of classification R is as follows: 


(4.01) Ya ia a k t = M + A 8l + jB„ + ■ • • + Ra k + + “ * * 

+ #!•*•* + * * * + I » k -2* k -l t k + • • • + •••,«* H“ 3«|«S, • • • ,» k t 

where si = 1, • • • , pi; s 2 = 1, • • • , • • • ; s k = 1, • • • , p k ) pi 

denotes the number of groups in classification A ; p 2 denotes the number 
of groups in classification B; . . . ; p k denotes the number of groups 
of classification R; M is the grand mean; A, B, . . . , and R are the 
measures of the main effects with respect to their own subscripts; V s are 
the measures of the interactions with respect to their own subscripts; and 
z 9iatl ... , ak t is the error. The solutions for the sum of squares of each 
source of variation are as follows: 

1. Within: 


(4.02) 


i ■ ■ ■ .ii*-i ■ ■ ■ 


Jfc-fold 


*-fold 


subscripts 


2. Interactions and main effects: 


r 



r-fold 

% < <j 


subscripts 

»*<••< i 


r-l fold subscripts 

!<■••< i < • • • < y 


+$(•.£••■■In. ..)-••• 


r-2 


r-2 fold subseapts 

i < ■ • < j i < • <j 


+ (-1 )'N?* 


where i, • • • , j = 1, 2, • • • , &; k is the number of classifications in the 
whole study; r is the number of classifications under calculation; s< (or s,) 
= 1, 2, • • • , pi (or pj); S m is so determined that 

(4.04) ‘ ' ' »^ Sm = N ~ ’••»?*» (m = 1, • • • , r) 

« */ 

and throughout the general expression the summations and the subscripts 
which are not connected with the classifications under calculation should 
be ruled out. For instance, if we calculate the sum of squares for the 
interaction between A and B, the formula becomes 
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(4.05) p», • • * , p*n £ £ F* iV • • • , 

81 81 

- (p*Pa, • * • , + PiPa, * ' * , p*n ^ ? 2 t , • • •) 

81 81 

+ iVY 2 . . . 

For another example, if we calculate the sum of squares for the main 
effect A, the formula becomes 4 

(4.06) pt, ■ • • , p*n £ ?*, • • • , - AT 2 , • • • 
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CHAPTER XI 


APPLICATIONS OF THE ANALYSIS OF VARIANCE AND 
COVARIANCE METHOD 

We shall now apply the method of analysis of variance and covariance 
to a number of the simpler practical problems met with by the research 
worker. The application to some of the more complex types of situations 
in which these methods are indispensable will be made in the sequel 
to the results of specific experimental designs. 

We shall proceed first by applying the principles presented in Chap¬ 
ter X to the mathematical solution of the problem. We shall then 
carry out the necessary calculations for the solution and interpretation. 
We begin with the simplest case of the analysis of variance, where there 
is a single criterion of classification. 

Problem XI.l. Single classification with equal representation in 
classes. Let us take the problem of measuring the resemblance in intel¬ 
ligence of identical twins reared apart as reported by Newman, Freeman, 
and Holzinger (Ref. 7). The data in the form in which we shall use them 
in this analysis are given in Table 56. We must first see if we can trans¬ 
late our problem into mathematical language. If it is amenable to such 
a translation, it can be expressed mathematically as a problem of testing 
statistical hypotheses. Mathematically, the relationship may be 

expressed thus: 

Xu — A + Ct + Zu (11.01) 

where i = 1, 2; t = 1, 2, 3, • • • , n(= 19); X it is the mental age of 
the tth member of the tth pair of twins; A is a measure of the common 
mental age of the group tested; C t is a measure of the mental age of the 
Jth twin pair; Zu is the measure of the random effects. The restriction is 

2^=0 ( 11 . 02 ) 

t 

We must first test the hypothesis that the variability of the mental- 
age scores is the same for all twin pairs, since this is the fundamental 
assumption underlying the analysis of variance. The hypothesis may be 
written 

H 0 :<r t = c (11.03) 

where c u denotes the standard deviation of the tth twin pair. This 
hypothesis is tested by means of the L-test (see page 82). The calcula- 
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TABLE 56 

Mental Ages op 19 Pairs of Identical Twins Reared Apart 


Twin pair 
(0 

Mental 

Xu 

age 

X* 

Sum 

Xu + Xu 

Difference 
\X U - X 2i \ 

1 

163 

186 

349 

23 

2 

126 

149 

275 

23 

3 

191 

194 

385 

3 

4 

170 

204 

374 

34 

5 

171 

178 

349 

7 

6 

195 

180 

375 

15 

7 

170 

172 

342 

2 

8 

170 

142 

312 

28 

9 

195 

185 

380 

10 

10 

187 

195 

382 

8 

11 

176 

222 

398 

46 

12 

223 

210 

433 

13 

13 

181 

182 

363 

1 

14 

164 

161 

325 

3 

15 

175 

171 

346 

4 

16 

123 

120 

243 

3 

17 

192 

175 

367 ; 

17 

18 

184 

148 

332 

36 

19 

168 

151 

319 

17 

Sum 

3,324 

3,325 

6,649 


Sum of 





squares 

590,946 

593,531 

2,361,311 

7,643 


tions are carried out as indicated in Table 57. With a value of L\ = .117, 
k = 19, and d.f. = 1, we refer to Nayer’s table (Table V, Appendix) 
and find that our value is greater than the table value (Li.oi = .096). 
Hence, we may accept the hypothesis II o at the 1 per cent level. 1 We can 
now proceed to apply the analysis of variance method. 

We use the maximum-likelihood procedure of estimating the sums of 
squares of the different components as shown below. We first write 

2 + 2X^(7, (11.04) 

t 

where X is the multiplier of Lagrange. Minimizing <t> with respect to A, 
C t , and X, that is, differentiating partially with respect to A, C t , and X, 
equating the resulting equations to zero, and solving them for the values 
Aj Ct, and X, we obtain 


* = 22 (X it — A — C t ) 

* t 


1 It should be pointed out that for the case n = 2, the Li-test may sometimes be 
indecisive. We are accepting the hypothesis here at the 1 per cent level. 
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= (11.05) 

C, = i £ - X. t - £.. (11.06) 

X = 0 (11.07) 


TABLE 57 

Calculation of Li in Testing H 0 :<rt — <r 


n t 


log nt 

nt log n t 

0/ = 

i 

log 0,' 

n t log Qt 

2 


.30103 

.60206 

264.5 

2.42243 

4.84486 

2 

H 

.30103 

.60206 

264.5 

2.42243 

4.84486 

2 

B 

.30103 

.60206 

4.5 

.65321 

1.30642 

2 

B 

.30103 

.60206 

578.0 

2.76193 

5.52386 

2 

1 

.30103 

.60206 

24.5 

1.38917 

2.77834 

2 

1 

.30103 

.60206 

112.5 

2.05115 

4.10230 

2 

1 

.30103 

.60306 

2.0 

.30103 

.60206 

2 

1 

.30103 

.60206 

392.0 

2.59329 

5.18658 

2 

MM 

.30103 

.60206 

50.0 

1.69897 

3.39794 

2 

B 

.30103 

.60206 

32.0 

1.50515 

3.01030 

2 

B 

.30103 

.60206 

1058.0 

3.02449 

6.04898 

2 

1 

.30103 

.60206 

84.5 

1.92686 

3.85372 

2 

1 

.30103 

.60206 

.5 


—.60206 

2 

1 

.30103 

.60206 

4.5 

.65321 

1.30642 

2 

El 

.30103 

.60206 

8.0 

.90309 

1.80618 

2 

B 

.30103 

.60206 

4.5 

.65321 

1.30642 

2 

B 

.30103 

.60206 

144.5 

2.15987 

4.31974 

2 

B 

.30103 

.60206 

648.0 

2.81158 

5.62316 

2 

B 

.30103 

.60206 

144.5 

2.15987 

4.31974 

AT * 

B 

log N - 1.57978 

11.43914 

3821.5 

log £ 9,' = 3.58224 

t 

63.57982 


Substituting these values in (11.04) to obtain the absolute minimum 
value of <t>, we have 


which is the basis for testing the following hypothesis: 


Hi:E(C t ) = 0 


( E is the notation for expectation' 
of a parameter 


(11.09) 


that is, the hypothesis that the mental age of an individual is independent 
of the particular twin pair to which the individual belongs. If the 
hypothesis is true, then (11.04) becomes 


* - | J (X u - AY 


(11.10) 
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Minimizing with respect to A and substituting the obtained value, 
A = X .. in (11.10), we obtain the relative minimum: 


*-ll (Xu - *..)* = 2 2 (Xit - + X2 (Xt - x -- y 


= xl + X? 


( 11 . 11 ) 

( 11 . 12 ) 


where x2 is the estimate of sum of squares for “within” and x\ is the 
estimate of the sum of squares for “between.” Then the test of Hi is 
given by 

F - ^4 (11.13) 

n - 1x2 


with rti = n — 1 and n 2 — n. For purposes of calculation it is simpler 
to write xl and x? in the form 


xl = i 2 (Xlt ~ Xj,)2 


xi=-! I (Xu + Xu) 


Calculations may be checked by 


a-12 x;, 


GW 


GW 


Separately, and using the identity, 


X? = xl + Xi 


(11.14) 


(11.15) 


(11.16) 


(11.17) 


The efficient way to calculate the necessary values is shown in Table 
56; we first form the sum and difference for each pair of values. We then 
calculate the sum and sum of squares for each column, except the last, 
where the sum of squares only is needed. By this method we secure 
a check on the calculations at each stage. 

From the last two rows of Table 56, we obtain 

2 (Xl, - X*)* = 7643 (11.18) 

t 

2 (Xu + Xu) 2 = 2,361,311 (11.19) 

t 

2 £ X it - 6649 (11.20) 

* t 

22 Xl = i [£ (Xl, + X it ) 2 + 2 (Xu ~ X,)*] = 1,184,477 (11.21) 
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Substituting these values in (11.14), (11.15), and (11.16), we have 

= 3 821.5 
X? = 17,255.5 
X? = 21,077.0 

We now place all of these values in one table as shown in Table 58. 


TABLE 58 

Analysis of Variance of Mental Ages of Identical Twins Reared Apart 


Source of variation j 

D.F. 

Sum of squares 

Mean square 

F 

Hypothesis 

Within pairs 

Between pairs 

19 

3,821.5 

17,255.5 

201.132 



18 

958.638 

4.766 

Rej. 

Total 

37 

21,077.0 


Referring to the F-table with ni = 18 and ri 2 = 19, we find that the 
obtained value of F is significant at the 1 per cent level. This statement 
means that the mental age of an individual is not independent of the twin 
pair to which he belongs, or that there is a significant difference among the 
means of the 19 twin pairs. Another interpretation is that the intraclass 
correlation between twins is significantly greater than zero. Intraclass 
correlation is discussed below. 

Fisher (Reference 3) has shown that an unbiased estimate of the 
intraclass correlation, r', can be obtained from the relation 


_ 1 + (& — l)r' 
1 - r' 


( 11 . 22 ) 


where k is the number in a group or class. Where k = 2, 

1 4* 

F = j^p (11.23) 

m, . , . 1 + t' 958.638 . moon 

Thus m our problem, ^ ^ ^ = 4.7662 

r' = .653 

A1 _ 958.638 - 201.132 _ cco 

S0 ' r ~ 958.638 + (2 - 1)(201.132) “ 

When there are equal numbers in the classes or groups, the variation 
of the class means relative to the variation of the individuals within the 
classes is measured by the intraclass correlation. If the class means 
differ significantly, a significant positive intraclass correlation is indi¬ 
cated; when the mean square between classes equals that within classes, 
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the correlation is zero; and if the mean square between classes is less than 
that within classes, the intraclass correlation is negative. 

Problem XI.2. Testing the homogeneity of multiple groups of 
measurements. We shall apply the analysis-of-variance method to 
test the homogeneity of 6 sections in college zoology with respect to their 
achievement as measured by a final examination. The basic data are 
given in Table 59. 

Denote by X gt the score of the tth. student in the sth section. The 
basic assumption in the analysis is that we may write 

X gt = A + B, + Zst (11.24) 

where s = 1 , 2, • • • , k; t = 1 , 2, • • • , n s ; n a denotes the number of 
students in the sth section, and k denotes the number of sections. A is a 
measure of the achievement of all the students and is defined as the mean 
score for all individuals and sections; B a is a measure of the achievement 
of the sth section; z at is a measure of random effects, assumed to be 
normally distributed about zero with constant standard deviation, <r. 
The restriction is 

^B. = 0 (11.25) 

8 

In assuming that a is constant, we are assuming that the variability 
of the scores is the same for each section. This assumption may not be 
fulfilled in practice, and hence, we must first test the hypothesis 


H 0 :a 8 = a ( 11 . 26 ) 

TABLE 59 

Sums and Sums of Squares of Scores for Each Section in College Zoology 


Section 

No. of students 
n t 

Sum of scores 
2X 

Sum of squares 
of scores 

2X 2 

Sum of squares 
about means 

v (ZX) 2 
2X* - -- - 

n, 

I 

145 

23,025 

3,759,061 

102,849.7931 

II 

91 

13,529 

2,065,833 

54,472.1099 

III 

84 

13,127 

2,130,435 

79,028.7024 

IV 

127 

18,825 

2,912,131 

121,732.3779 

V 

46 

6,828 

1,071,968 

58,455.3043 

VI 

82 

12,889 

2,108,159 1 

82,228.2560 

Total 

575 

88,223 

14,047,587 511,417.0036 


where <r, denotes the standard deviation of the scores in the 5th section. 
If this hypothesis is accepted, we conclude that there is no difference in 
variability among the sections and then proceed to test the other hypothe¬ 
sis. If we reject the hypothesis H 0 , we cannot make an exact test of 
another hypothesis. 
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The test of the hypothesis Ho may be made as follows. We calculate 


i.-n 

8 



(11.27) 


where N = ^ n„ II denotes the product, and 6', denotes the “ within " sec- 
8 

fcions sum of squares for the 5th section. We refer to Nayer’s tables of the 
Li-distribution with fc = 6 and d.f. equal to the harmonic mean of /„ 
where /, denotes the d.f. associated with 0' in the sth section. The rule 
to be followed in using these tables is to reject the hypothesis when the 
calculated value of L\ is less than the corresponding 1 per cent point 
given in the table. 

The computation of the L\ for the 6 sections is carried out as shown in ■ 
Table 60. 


TABLE 60 

Calculation of Li for the Test of the Hypothesis H*:<r 9 * <r 


n« 

log n$ 

iu log n« 

0 

log da 

n«log 0/ 

145 

2.16137 

313.39865 

102,849.7931 

5.01221 

726.77045 

91 

1.95904 

178.27264 

54,472.1099 

4.73618 

430.99238 

84 

1.92428 

161.63952 

79,028.7024 

4.89778 

411.41352 

127 

2.10380 

267.18260 

121,732.3779 

5.08541 

645.84707 

46 

1.66276 

76.48696 

58,455.3043 

4.76682 

219.27372 

82 

1.91381 

156.93242 

82.228.2560 

4.91502 

403.03164 

N - 575 

log N - 2.75969 

1153.91279 

20*' - 498,766.5436 

log 20/ - 5.69790 

2837.32878 


To find the value of Li, we calculate the value of logLi, where 
log Li = log N - ^ £ n. log n, + £ ft, log 6' % - log 0^ and then 


find L\ from a table of antilogarithms. 

Here, log U - 2.75967 - 2.00680 + 4.93448 - 5.69790 
= 9.98945 - 10 
Li = .9760 

6 

The harmonic mean of /, = -r—|—j—:—r— ; ,—i— T —;— r = 82.64 

TTT T TT T HTT T T57 T TTT T 1ST 

Referring to Nayer’s tables with k = 6 and d.f. = 83, we find that 
P > .05. We accept the hypothesis Ho and conclude that the sections 
are of equal variability. Consequently, we can proceed to the analysis 
of variance. 

The next step is to estimate the sum of squares for “within.” By the 
method of maximum likelihood, we obtain 

* - £ X ~ A ~ Bt)i + 2X J B > 

• t « 


(11.28) 
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Differentiating partially with respect to A , B t , and X, equating these 
equations to zero, and solving the resulting equations for the values of 


A, B„ and X, we obtain 



(11.29) 

B. = - A I I( - i = X.. — X.. 

(11.30) 

n, 7 


X = 0 

(11.31) 


Substituting these values in (11.28) to obtain the absolute minimum 
value of 0, we have 


n. 


(11.32) 


which is the basis of testing the following hypothesis: 

H E(B ) - 0 (E lS nota ^ on * or expectation of\ 

** ^ ~ \a parameter / ' 


that is, the hypothesis that the sections are equal in achievement. If the 
hypothesis is true, then (11.28) becomes 


0 = 2 2 (Z,< ~ A) 2 (11 - 34) 

a t 


Minimizing with respect to A and substituting the obtained value of 
A = X .. in (11.34), we obtain the relative minimum: 


X? 



GW 

« t 

N 


= xi+2 

= xS + X? 


G x -)' GW 

A W m A 


ft. 


N 


(11.35) 


where xl is the estimate of sum of squares for “within” and x\ is the 
estimate of sum of squares for “between.” Then the test of Hi is given 
by 

AT v 2 

F = ^ (11.36) 

U Xa 

with n i = 5 and = N — 6. 

The “within” sum of squares may be obtained directly from the last 
row of Table 60, ^ = 498,766.5436. The “between” sum of squares 

a 

is calculated from the totals given in the third column of Table 59 as 
follows: 
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(23,025)* (13,529)* (13,127)* (18,825)* (6828)* (12,889)* 

145 + 91 + 84 + 127 + 46 + 82 


(88,223)* 

575 


= 12,650.4599 


The total sum of squares is 

GW 

a t 

N 


2 2 X*, - ---- = 14,047,587 - = 511,417.0036 


To test the hypothesis Hi, we calculate 


_ 569 12,650.46 _ 2530.092 _ n 

5 498,766.5436 876.567 

Referring to the F tables with rti = 5 and n 2 = 569, we find that 
.05 > P > .01. Statistically, the acceptance of Hi remains in doubt. 
We may state that the differences among the means of sections are 
significant at the 5 per cent level but not significant at the 1 per cent 
level. The results are summarized in Table 61. 


TABLE 61 

Analysis of Variance of Scores in Different Sections in Zoology 


Variance 

D.F. 

Sum of squares 

Mean 

square 

F 

Hypothesis 

Between sections 
Within sections 

5 

569 

12,650.46 

498,766.5436 

2530.092 

876.567 

2.886 

Remains in doubt 

Total 

574 

| 511,417.0036 


When a significant difference has been found among the means of the 
sections, it may be of interest to make special comparisons between the 
means of any two of the sections. Here the usual test of significance 
between means cannot be applied, since the two means cannot now be 
regarded as randomly drawn from a normal population. Fisher (Ref. 2) 
has suggested a test taking into account the probability of a random 
sampling based on binomial theory. This method will be illustrated by 
comparing the mean of the highest with that of the lowest section. The 
analysis of variance could be used, but we shall apply the t-test with the 
modifications indicated. 

We shall test the significance of the difference between the means of 
Section I and of Section IV, 158.8 and 148.2, respectively. We find that 

(*i - *«) 


158.8 - 148.2 
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for n = 270 the corresponding P < .01. The difference selected, how¬ 
ever, is only 1 of 15 that might be found among the means of the six 
sections. The required probability for the selected difference to be 
significant is set, therefore, not as 1 in 100 but as 1 in (15) (100) = 1500. 
Since the probability corresponding to the observed value of t is about 
.0024, or 2.4 in 1000, it is greater than 1.5 in 1000 and therefore is regarded 
as not significant. 

Problem XI.3. An application of the analysis of covariance. The 

process of applying the analysis of covariance consists in breaking up 
the sum of products into parts assignable to different factors. This is 
comparable to the process of breaking up the sum of squares in the case 
of the analysis of variance. 

We shall apply the method of the analysis of variance and covariance 
to the combined analysis of mental-age scores and educational-age 
scores, as measured by the Stanford Achievement Test, in the case of the 
19 pairs of identical twins reared apart. 

Let X it denote the educational age of the ith member of the £th twin 
pair and Y it the mental age of the ith member of the £th twin pair. We 
may then write 


and 

with restrictions 


Xu = A + C t + Zit 

(11.37) 

Yu = B + D t + At 

(11.38) 

o 

II 

(11.39) 

y D t = 0 

(11.40) 


t 


where i = 1, 2; t = 1, 2, • • • , n; where n is the number of twin pairs. 

The difference between the educational ages of the pairs of identical 
twins may be due partly or wholly to differences in mental age. The 
problem is to find out what part of these differences may be assigned 
to differences in mental age and to adjust the analysis accordingly. We 
wish to find out whether there is a difference in achievement of the 
identical twins when they may be regarded as of the same mental age. 

If we may assume that there is a linear relationship between educa¬ 
tional age and mental age, 2 we may write, since Y it denotes the mental 
age of the ith member of the Jth twin pair, > 


Xu — a + bYu + Sa 


(11.41) 


where a and b are parameters to be estimated from the data; b is the 
regression coefficient of educational age on mental age; Su is the measure 
of the differences between the educational ages of members of the same 


2 For the test of linearity see page 240. 
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twin pairs and of those between the educational ages of pairs of twins 
not attributable to the factor of mental age. 

As formerly, we minimize: 

<t> - £ £ (Xu ~ a - bY«)* + 2Xi £ C t + 2X 2 £ D, (11.42) 

it t t 


with regard to a, b , Ai, and X 2 to obtain the relative minimum of <t >, Xr- 
Solving for a, 6, Ai, and X 2 , we have 


« - ^ (X X & X X O 

* n it it 


2 (. XuY lt + XnYr) 

b = l - 


(Xu + x 2 t)J (Yu + y«)] 


2 n 


YY gw 

lln-^~k— 


Xi = 0 

X 2 = 0 


(11.43) 


(11.44) 


(11.45) 

(11.46) 


Substituting the values of a, b, Xi, and X 2 into Equation (11.42), we 
obtain 


X? - g 2 (Xu ~ XuY + \ 


2 t 


V (XX **) 2 

4 (*» + 5 - - 


[^(z u + x«)][^(r„ + r*)' 
X (XuYu + XuY it ) - - - 


2 n 


11 n 


(i i r -y 


i t 


T T ' 2n 

= xl + Xu say 


(11.47) 

(11.48) 


l = 


The proportion of the variance attributable to mental age is 

[^(x u + x 2 ,)][J(Fi S + r*)] 

4 (XuYu + X 2i y 2( ) - -i- 


2 n 


YY GW 

4 4 n - -Ms; 


i t 


2n 


(11.49) 
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To obtain xL we subtract l from the “between” pairs sum of squares, 
since the other two quantities in (11.47) are the “within” and “between” 
pairs sums of squares for educational age. 

The necessary calculations may be efficiently carried out if the data 
are arranged as shown in Table 62. 

The results are presented in tabular form in Table 63. 


TABLE 62 

Educational and Mental Ages of 19 Pairs of Identical Twins Reared Apart 


Twin 

pair 

Educational 
ago Xu 

|x» - x„| 

Xu + Xu 

Mental 

Yu 

age 

Xu • Yu 

Xu • Yu 

(Xu + Xu) 
(Y u + Yu) 

Xu 

Xu 

Yu 

Yu 

1 

181 

200 

19 

381 

163 

186 

29,503 

37,200 

132,969 

2 

131 

109 

38 


126 

149 

16,506 

25,181 

82,500 

3 

205 

189 

16 

394 

191 

194 

39,155 

36,666 

151,690 

4 

173 

207 

34 

380 

170 

204 

29,410 

42,228 

142,120 

5 

170 

182 

6 

358 

171 

178 

30,090 

32,396 

124,942 

6 

151 

155 

4 

306 

195 

180 

29,445 

27,900 

114,750 

7 

191 

189 

2 

380 

170 

172 

32,470 

32,508 

129,960 

8 

175 

162 

13 

337 

170 

142 

29,750 

23,004 

105,144 

9 

210 

202 

8 

412 

195 

185 

40,950 

37,370 

156,560 

10 

181 

200 

19 


187 

195 

33,847 

39,000 

145,542 

11 

157 

226 

69 

383 

176 

222 

27,632 

50,172 

152,434 

12 

224 

210 

14 

434 

223 

210 

49,952 

44,100 

187,922 

13 

190 

189 

7 

385 

181 

182 

35,476 

34,398 

139,755 

14 

170 

159 

17 

335 

164 

161 

28,864 

25,599 

108,875 

15 

159 

161 

2 

320 

175 

171 

27,825 

27,531 

110,720 

10 

130 

131 

1 

261 

123 

120 

15,990 

15,720 

63,423 

17 

176 

176 

0 

352 

192 

175 

33,792 

30,800 

129,184 

18 

192 

157 

35 

349 

184 

148 

35,328 

23,236 

115,868 

19 

177 

172 

5 

349 

168 

151 

29,736 

25,972 

111,331 

Total 

3,301 

3,436 


6,797 

3,324 

3,325 

595,727 

610,981 

2,406,689 

Sum of 










squares 

605,187 

631,518 

10,417 

2,462,993 

590,946 

593,531 

1,206,708 



TABLE 63 

Analysis of Variance and Covariance of Educational and Mental Ages 




Sums of squares 

; 

Sums of 
Products 
M.A. times 
E.A. 

Regres- 

Correia- 

Variance 

D.F. 

Mental 

age 

Educational 

age 

sion 

coefficient 

tion 

coefficient 

Between pairs 
of twins 

18 

17,255.5 

15,727.8 

13,548.369 

0.785 

.822 

Within pairs 
of twins 

19 

3,821.5 

5,208.5 

n 

■Hi 

.866 

Total 

37 

21,077.0 

20,936.3 

Ha 

0.827 

.829 


The quantities entered in Table 63 are calculated as follows. 
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Educational age: 


Between pairs: 

t 

+ x 2t y - a 

iiiy 

% t 

) 2 = 15,727.8 

Within pairs: 

i j {Xu 

- x 2l y 


= 5208.5 

Total: 

ii* 

i t 

- * (X 2 z,t ) 2 

% t 

= 20,936.3 

Mental age: 





Between pairs: 

i l (K„ 

t 

+ Y 2l y - A 

(2i y 

X t 

| 2 = 17,255.5 

Within pairs: 

4 

- Y 2t y 


= 3821.5 

Total: 

lln 

i t 

i t 

“) 2 

= 21,077.0 


Products of educational age by mental age: 

Between pairs: i ^ (X u + X 2 <)(Yh + Y 2 t ) 

t 

— ws [^(x« + x«)][J(r u + r 2t )] 

t t 

= £ (2,405,689) - A[6797][6G49] = 13,548.4 
Within pairs: £ X lt Y u + J X 2l Y 2t - £ £ (X lt + X 2l )(Y lt + Y 2t ) 

t t t 

= 1,206,708 - £(2,405,689) = 3,863.5 
Total: £ XuYu + 2 X * tYit 

t t 

[l (.Xu + X*)] [£ (Yu + F 2 ,)] 

- 1,206,708 - A[6797][6649] = 17,411.9 


Two methods for adjusting the sum of squares of educational ages are 
given. The first method makes possible a more nearly exact test of sig¬ 
nificance. The adjusted sums of squares are obtained by adjusting the 
“within” pairs and “total” each with its own regression coefficient and 
subtracting to find the adjusted “between pairs” sum of squares. This 
method is depicted in Table 64. The adjustments are as follows: 


Within pairs: 5208.5 - = 1302.5 

Total: 20,936.3 - = 6552 5 

Between pairs: 6552.5 — 1302.5 = 5249.7 
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TABLE 64 

Analysis of Variance of Educational Age Scores .of the 19 Pairs of Identical 
Twins—Method 1 Original and Adjusted Sums of Squares and Mean Squares 




Original analysis 


Adjusted analysis 

Variance 

D.F. 

Sum of 
squares 

Mean 

square 

D.F. 

Sum of 
squares 

Mean 

square 

Between pairs 

19 

15,727.8 

873.767 

18 

5249.7 

291.65 

Within pairs 

m 

5,208.5 

274.132 

18 

1302.5 

72.36 

Total 

m 

20,936.3 

565.846 

36 

6552.2 



A second method of adjusting the sum of squares is shown in Table 65. 
Both the “between pairs” and the “within pairs” sums of squares are 
adjusted by the use of the “within pairs” regression coefficient: 

2(z — by) 2 = 2x 2 — 2 b2xy + 5 2 2y 2 

= > 5 - 727 - 8 - 2 §S < i3 - 548 - 4) +(So)’ (i7 - 255 - 5) 

= 15,727.8 - 2.02198(13,548.4) + 1.02210(17,255.5) 

= 15,727.8 - 27,394.5938 - 17,636.8466 = 5970.053 

In certain cases it may be necessary to adjust each sum of squares 
with its own regression coefficient (Ref. 5). 


TABLE 65 

Analysis of Variance of Educational Age Scores of the 19 Pairs of Identical 
Twins—Method 2 Original and Adjusted Sums of Squares and Mean Squares 




Original analysis 


Adjusted analysis 

Variance 

D.F. 

Sum of 

Mean 

D.F. 

Sum of 

Mean 



squares 

square 


squares 

square 

Between pairs 

18 

15,727.8 

873.767 

18 

5970.053 

331.670 

Within pairs 

19 

5,208.5 

274.132 

18 

1302.500 

72 361 

Total 

37 

20,936.3 

565.846 

36 



The adjusted between pairs sum of squares and mean square give 
a measure of the difference between twin pairs in educational age freed 
from the influence of mental age. To test the hypothesis that these 
adjusted differences are zero, we calculate: 



mean square between pairs 
mean square within pairs 
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and refer to Fisher’s tables of z with degrees of freedom ni = n — 1 and 
«2 “ n — 1, where n is the number of twin pairs. In our example we 
have 

- \ logs or i log. 4.03 = .697 (Table 64) 

or «o = ^ log, or i log, 4.584 = .761 (Table 65) 


From Fisher’s tables of z , entered with degrees of freedom wi = 18 
and n 2 = 18, we find that z 0 is greater than the value given in the table 
at the 1 per cent level. We could also have used the tables of Snedecor’s 
F. We reject the hypothesis and conclude that when the factor of mental 
age is removed, the means of the educational ages of twin pairs differ 
significantly. 

We obtain three measures of the degree of relationship between 
educational age and mental age from the results of Table 63. From the 
first row, for between pairs, we have 

_ 13,548.369 = 13,548.369 = R99 

ri V(17,255.5) (15,727.8) V271,391,052^ 

From the second row, for within pairs, we have 


3863.5 

V (3821.5) (6208.5) 


.866 


From the third row, for the total, we have 


r = 17,411.9 = 17,411.9 = * 9Q 

3 V(21, 077) (20,936.3) 21,006 

The second, r 2 = .866, is the best measure of the degree of relation¬ 
ship; in the third, r 3 = .829 the relationship is masked by the inclusion 
of the between-pairs differences in educational age and in mental age. 

Problem XI.4. Test of the linearity of regression. The statistical 
study of the relationship between two or more variables involves consid¬ 
eration of the kind of relationship existing among them. Regression 
may be linear or nonlinear, and it is essential in any problem involving 
the use of regression to determine which particular kind best represents 
the observational data. The statistical method of correlation, particu¬ 
larly the product-moment correlation coefficient, involves the assumption 
of linearity of regression. The analysis of variance provides a straight¬ 
forward method of testing the type of regression. Since linear regression 
is the type most often encountered, we shall consider here the problem of 
testing the linearity of regression (Ref. 5). 8 


* For other cases of polynomial equations and especially for the separation of sums 
of squares corresponding to individual degrees of freedom where the independent 
effects are represented by polynomials of different degree, see page 309. 
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We may take as a practical problem the case presented in Table 66, 
that of determining whether the relationship between the scores of the 
same individuals on two tests, one administered prior and the other 
subsequent to instruction, was linear in form. We shall also test another 
assumption underlying the product-moment correlation method, the 
homoscedasticity of variances of the different arrays, that is, if the 
variances of the different arrays are equal. 

TABLE 66 

Correlation Table for the Initial and Final Scores of 263 Students on a Test 

in College Biology 


X (initial score) 



Let X and Y represent the scores on the initial and final tests, respec¬ 
tively. Then the regression function, when linear, is given by 

t = a + b(X-X) (11.50) 


where a and b are two parameter values, the value chosen for a being the 
mean, Y f of the observed values F, and the value given to b being the 
estimate of the regression coefficient of Y and X . F is the expected 
value of Y for each X , and X is the mean of the X values. 

In Table 66 the data are grouped, and we shall take as the selected 
values of X the mid-points of the several class intervals as shown in Table 
67. It is observed in Table 66 that for each X the several values of Y 
form an array. Then, letting Y tt represent the score on the final test 
of the Jth individual in the sth array, we have 

$ bi = A + Bb + z$t 


( 11 . 61 ) 
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where t = 1 , 2, • • • , n a ; $ = 1 , 2, • • • , k; k denotes the number of 
arrays and n a the number of individuals in the sth array. A is the 
mean of the scores of all individuals on the final test; B a gives the measure 
of achievement on the final test of all individuals in the sth array; and 
z at represents the measure of residual variation or the portion of Y at 
attributable to random factors, such as errors of measurement, which are 
independent of X . z ai is assumed to be normally distributed about 0 
with standard deviation <r, supposed to be the same for all arrays. The 
latter assumption will be tested first, by using the Li-test. The sums 
and sums of squares of the scores on the final test in each array are given 
in Table 67. From Welch’s formula for the Li-test, 

(n - 62> 

8 

we have 

logZ/i = log N — ^ £ n, log n, + £ n, log O', - log 0 ') (11.53) 


TABLE 67 

Sum and Sum of Squares of Final Scores in Each Array 


Array 

Value 

of 

n a 

Sum of 
scores 

Sum of 
squares 
of scores 

a 

v <?">■ 

No. 

l Y “ 

t 

n 

t 

V V2 1 

X. 


2, y *« 

t 

n a 

/ 1 

Li n, 

t 

(i) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

1 

10.5 

12 

318.0 

8,659.00 

8,427.00 

232.00 

2 

12.5 

10 

265.0 

7,286.50 

7,022.50 

264.00 

3 

14.5 

14 

407.0 

12,379.50 

11,832.07 

23,576.03 

547.43 

4 

16.5 

30 

841.0 

24,149.50 

35,408.25 

573.47 

5 

18.5 

41 

1190.5 

34,565.62 

842.63 

6 

20.5 

33 

1016.5 

32,016.25 

31,312.80 

25,060.56 

703.45 

7 

22.5 

29 

852.5 

25,809.25 

748.69 

8 

24.5 

30 

919.0 

29,023.25 

28,152.03 

870.22 

9 

26.5 

32 

1028.0 

33,484.00 

33,024.50 

459.50 

10 

28.5 

22 

713.0 

23,447.50 

23,107.68 

339.82 

11 

30.5 

10 

325.0 

10,930.50 

10,562.50 

368.00 

Total 

263 

7875.5 

242,593.50 

236,643.29 

5950.21 


In our problem, as shown in Table 68, we find 
log L% = 2.4200 - 1.4219 + 2.7693 - 3.7745 - 9.9829 - 10 
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So we obtain Li = .961. Referring to Nayer’s tables with k = 11 and 
harmonic mean, 


/.= 


_ 11 _ 

tt + i + rsr + + tV + ts + + uV + + -st + i 

18 


TABLE 68 

Calculation of Li for the Test of the Hypothesis H 0 :<t b — o 


/. 

fit 

log fit 

fit log Tit 

0/ 

log e.' 

n, log 0, 

11 

12 

1.0792 


232.00 

2.3655 


9 

10 

1.0000 


264.00 

2.4216 


13 

14 

1.1461 


547.43 

2.7383 


29 

30 

1.4771 


573.47 

2.7585 


40 

41 

1.6128 


842.63 

2.9256 


32 

33 

1.5185 


703.35 

2.8472 


28 

29 

1.4624 


748.69 

2.8742 


29 

30 

1.4771 


871.22 

2.9401 


31 

32 

1.5051 


459.50 

2.6623 


21 

22 

1.3424 


339.82 

2.5313 j 

• • • 

9 

10 

1.0000 


368.00 

2.5658 



263 

^ Tit log n„ 

= 373.9627 

5950.21 

^ n 8 log 0/ 

= 725.6954 



8 



8 



we find that the value of Li is greater than the tabled value at the 5 per 
cent level, so we may assume that a, is constant. The first analysis of 
the scores for the final test is given in Table 69. 


TABLE 69 

Analysis of Variance of Scores on Final Test 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Between means of arrays 

10 

812.49 

81.249 

Within arrays 

252 

5950.21 

16.904 

Total 

262 

6762.70 



The analysis consists in breaking up the total sum of squares into two 
components. One component gives the mean-square estimate of the 
population variance between means of arrays and the other the mean- 
square estimate within arrays. The respective mean squares are given 
in the analysis-of-variance table. The sums of squares for each source 
of variation are obtained as follows, making use of the totals recorded in 
Table 67. 

The within-rays sum of squares is the total of column (7), 5950.21; the 
between-means of arrays sum of squares is obtained from the totals of 
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columns (4) and (6): 

236,643.29 - = 812.49 


and the total sum of squares is calculated from the totals of columns (4) 
and (5): 


242,593.50 


(7875.5) 2 

263 


= 6762.70 


The hypothesis Hi, that the regression of Y on X is linear, is stated as 
follows: 

Hi: f. = a + b(X, - X) (11.54) 

where is the expected value of Y for X„ the sth value of X. If Hi- 
is accepted, then Equation (11.51) may be written 

Y. t = a + b(X, - X) (11.55) 

Hi may then be tested in the conventional manner of testing a linear 
hypothesis. We have 

x 2 = ~ A ~ B -y ( iL5e ) 

9 t 


We then minimize x 2 with respect to all parameters to get the absolute 
minimum xl • Thus: 


xl 



(11.57) 


which gives the sum of squares within arrays. 

We then minimize x 2 with respect to the parameters remaining under 
the assumption that Hi is true. Thus, minimize 


x 2 = [y “ ” a ” ( n - 58 ) 

8 t 

with respect to a and b to get the relative minimum, x?« We get 


a 


b 



![<*■-*> Q»] 


^ [n,{X, - Z) 2 ] 


(11.59) 


(11.60) 
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(W1 GW 

_ t __ 8 t _ 

“• J In. 

•{3 <*.-*)(£r.)]} 


s (11.61) 


V l n.(X. - A) 2 ] 


= x2 + x2, say 


xl is observed as equal to the “between means of arrays” sum of squares 
minus the quantity i, where 


£[<*-*> G r.)]}‘ 

8 _ t _ 

2 m*. - ^) 2 i 


(11.62) 


We now test the hypothesis Hi by calculating 


(11.63) 


and then refer to Snedecor’s tables of F (Table IV, Appendix) with 

n — k — 2 and n 2 = ^ n 8 — k. 

8 

The components with the corresponding calculated values are then 
entered in an analysis-of-variance table. The quantity l is entered as 
the variation “due to linear regression” and xl a s the variation “due to 
departure from linear regression.” 

We now proceed to calculate l using the values recorded in Table 70, 
from which we get 

_ (452.70) 2 = 

1 7046.68 29-08 

Finally, the complete analysis in our problem is summarized in Table 
71. For the test of the hypothesis Hi, we obtain 

_ 87.046 _ ... 

Fo 16.904 5-15 


We enter the F-tables with n\ = 9 and n 2 = 252 and find that F 0 is 
greater than the interpolated value of F at the 1 per cent level. There¬ 
fore, we reject the hypothesis Hi and conclude that the regression of Y 
on X is nonlinear in form. 
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TABLE 70 

Calculation of the Value of l 


n. 

X. 

x.-X 

2 y “ 

t 

(. x . - X)(2 Y„) 

n.(X. - X )* 

12 

10.5 

- 10.6 

318.0 

- 8870.80 

1348.32 

10 

12.5 

- 8.6 

265.0 

- 2279.00 

739.60 

14 

14.5 

- 6.6 

407.0 

-2686.20 

609.84 

30 

16.5 

- 4.6 

841.0 

- 8868.60 

634.80 

41 

18.5 

- 2.6 

1190.5 

- 8096.80 

277.16 

33 

20.5 

- 0.6 

1016.5 

- 609.90 

11.88 

29 

22.5 

1.4 

852.5 

1193.50 

56.84 

30 

24.5 

3.4 

919.0 

1286.60 

346.80 

32 

26.5 

5.4 

1028.0 

5551.20 

933.12 

22 

28.5 

7.4 

713.0 

5276.20 

1204.72 - 

10 

30.5 

9.4 

325.0 

3055.00 

883.60 

263 

X = 21.1 



452.70 

7046.68 


TABLE 71 

Analysis of Variance of Scores on Final Test—Complete Analysis 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Linear regression. 


29.08 

29.080 

Departure from linear regression. 


783.41 

87.046 

Within arrays. 


5950.21 

16.904 

Total 

262 

6762.70 



The same methods could be used in testing the form of regression of 
X on Y. 

Problem XL5. The complete procedures for the analysis of variance 
and covariance for the data of a single classification. In order to illus¬ 
trate how to calculate all the numerical values needed in a complete 
analysis of variance and covariance in the case of a single criterion of 
classification, how to proceed with the application of principles including 
the testing of underlying assumptions, and how to interpret the results, 
application has been made to the following problem. We wish to 
systematize the operations involved in the analysis in the most efficient 
way. 

The primary data are given in Table 72, which gives the initial and 
final scores on a test of educational development, and the mental ages of 
54 high-school students classified by grades, 18 students in each of the 
tenth, eleventh, and twelfth grades. 
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We wish to test the hypothesis that educational development is 
independent of grade, that is, that the mean achievements of students 
in the three grades are equal. The complete analysis is in three parts: 

In Part I, we calculate all the values required for the complete 
analysis and carry out the analysis of variance on the final test scores; 

In Part II, we give the complete procedures for the analysis of vari¬ 
ance and covariance with one independent variable; 

In Part III, we present the complete procedures for the analysis of 
variance and covariance with two independent variables. 


TABLE 72 

Mental Ages, Initial and Final Scores on an Educational Development Test 
op 54 High-School Students Classified by Grade* 


Grade 10 

Grade 11 

Grade 12 

Final 

M.A. 

Initial 

Final 

M.A. 

Initial 

Final 

M.A. 

Initial 

Yu 

Xu 

Z u 

Yu 

x 2t 

Z 2 t 

Y 3l 

Xv 

Zzt 

30 

45 

28 

26 

62 

22 

29 

60 

25 

25 

58 

22 

26 

57 

21 

29 

88 

24 

22 

46 

19 

24 

65 

21 

22 

64 

19 

26 

56 

22 

24 

54 

25 

23 

64 

21 

17 

19 

14 

23 

55 

18 

20 

47 

17 

14 

29 

14 

15 

24 

13 

19 

75 

17 

18 

34 

18 

18 

40 

17 

17 

29 

16 

17 

17 

14 

16 

24 

13 

15 

38 

15 

12 

19 

9 

13 

23 

12 

14 

28 

12 

21 

44 

16 

26 

60 

22 

33 

94 

29 

21 

44 

21 

25 

57 

22 

29 

89 

29 

19 

6 

17 

23 

52 

19 

25 

78 

22 

20 

38 

18 

22 

54 

19 

23 

50 

21 

18 

27 

16 

21 

54 

19 

18 

57 

19 

14 

18 

14 

17 

52 

16 

17 

43 

17 

14 

18 

9 

19 

40 

17 

15 

36 

13 

12 

18 

7 

15 

28 

12 

15 

35 

14 

9 

5 

7 

13 

48 

12 

10 

14 

9 


* Mental age, in terms of months, has been reduced by 100. Define Y 9tj X at , and 
Z 8t as the final, mental age, and initial scores, respectively, for the tth individual in 
the sth group; where s = 1, 2, 3, denoting grade 10, 11, 12, respectively, and t = 1, 
2, . . . , 18. 


Part I 

Step 1. Calculate the following values: 

(Some of the values reported here were calculated for later use and 
need not be considered in the analysis-of-variance procedure.) 

an = £ Y\ t = 900 + ■ • • + 81 = 6511 

t 
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o 21 = £ Y\ t = 676 + • • • + 169 = 7806 

t 

a»i = £ II* = 841 + • • • + 100 = 8413 

t 

an = J X\ t = 2025 + 

t 

a M = ^ XI = 3844 + 

t 

a 32 = ^ = 3600 + 

t 

a n = £ Z?. = 784 + • 

t 

= £ Z| # = 484 + • 


023 

033 


• + 25 = 20,727 
+ 2304 = 43,317 
■ + 196 = 63,595 
+ 49 = 5047 
+ 144 = 5970 


= Y Z| t = 625 + • • • + 81 = 6909 

t 

aw = ^ (FuXu) = 1350 + • • • + 45 = 11,099 

t 

o 24 = 2 (Y*X*) = 1612 + • • • + 624 = 18,169 

t 

a s 4 = ^ (YstXst) = 1740 + • • • + 140 - 22,737 

t 

= J (YuZh) = 840 + • • • + 63 = 5701 

t 

= 2 (Y 2< Z*) = 572 + • • • + 156 = 6808 

t 

ass = £ (r«Z«) = 725 + • • • + 90 = 7607 
* 

a M = J (X U Z 1( ) = 1260 + • • • + 35 = 9756 
< 

a 2 « = ^ (X*Z*) = 1364 + • • • + 576 = 15,884 

t - 

OM = £ (X, t Z 3i ) = 1500 + • • • + 126 = 20,587 

&*■)■ 


Oib 

026 


Oil = 


18 “tF" 6013 
d ^ 

. * L = g)f . 7442 
18 18 


C21 = 
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(W 


C 31 = 


C 12 = 


C 22 — 


C 32 — 


c 13 — 


C 23 — 


C33 — 


C14 — 


C 2 4 = 


C34 


Cl6 = 


C26 = 


C85 = 


Cifl = 


_ (373) 2 


18 

18 

t 

(541) 2 

18 

18 

t 

(849) 2 

18 

18 

t 

(989) 2 

18 

18 

t 

(285) 2 

18 

18 

(W 

t 

(320) 2 

18 

18 

G»’ 

t 

(339) 2 


= 7729 

= 16,260 
= 40,045 

= 54,340 

= 4513 

= 5689 


18 18 

— uoou 

(1 F “) (l - Y “) 

t t 

329(541) 

18 

18 

(l Y -) (l x «) 

t t 

366(849) 

18 

18 

G»(I» 

t t 

373(989) 

18 

18 

G*0G'-) 

t t 

329(285) 

18 

18 

(2 Y ') (2 z ”) 
f t 

366(320) 

18 

18 

(I r ") (2 z ”) 

t t 

373(339) 

18 

18 

G^)G*-) 
* * 

541(285) 

18 

18 


= 9888 


= 17,263 
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(z x v (2 z *j 

r -t ' _ 849(320) _ 

C28-Jg -j“g — 10,093 

(7 x 3t ) (V z 3 ) 

c ,, 3 . , 4 = « = 18>626 

(y y f „) 2 

_ 77 ' _ (1068) 2 _ n 
di-gj-54-21,123 

(yy*-‘Y 

d -^ z =^ = 104 - 808 

(yy z.x 

4 -> t ' _ (944) 2 _ 

d 3 - u -- -gj- - 16,503 

<??'-)(??*•) 

dt -54-54- 47 ’ 051 

(2l r -)( ll z -) 1068(944) 

<h = -- ‘—I - = 1068 (944) = lg 67Q 

54 54 

. <??*■><??«■> 

dt -54-54- 41 ’ 588 

01 = ^ ^ Y* t — an + 021 + 0,3 1 = 22,730 

8 t 

02 = ^ ^ X* t = an 4~ a22 4" 032 = 127,639 

8 t 

03 — ^ ^ = & i % + 023 + 083 = 17,926 

8 t 

04 ^ ^ ^ 8t ^ 8 ^ = ® 14 4“ a24 4“ ° 84 = 52,005 

* * 

05 = ^ ^ (7*3*) = ais 4" 025 4" 085 = 20,116 

« t 

06 = ^ ^ (XatZit) == 016 4" 026 4- 086 = 46,227 

. 

Ci ®- to - = : Cn + C 21 + C 31 = 21,184 

?(?*•)• 

C2 =-lg- = C12 4“ C22 + C32 = 110,645 
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id 

8 t 


C 3 — — C 13 + c 2 3 + C 33 = 16,587 


>] 

, 8 t t 

Ci ~ 18 

- cu + c 2 4 + C 34 = 47,645 


1 ] 

„ 8 t t 

C& ~ 18 

— — C 15 + C 2 5 + C 35 = 18,741 

2 [(2 *-)(!*-] 

1 ] 

-- ■ ' 18 ' 

— — Ci 6 + c 2 c + C 36 = 42,285 

Step 2 . Calculate the sum 

of squares of y for each group. 

Define 


0 . = 2 - f *) 2 = X y« 

t t 

lr„ 


where Y a = 1 ^ 


It is obvious that 



v r,, y 

*• - 2 ' 18 
t 

Therefore, we have 

0i = an — cn = 498 = ^ y\t 

t 

02 = a 2 i — c 2 i = 364 = ^ y\ t 

t 

03 = azx — c Z i = 684 = ^ y\ t 

t 

Step 3. Use the Li-criterion to test the hypothesis Hi:<r a = <r. The 
calculations involved are summarized in Table 73. 

Step 4. Calculate the following values for the analysis of variance 
of y. The sums of squares for the different sources of variation are 
(see Step 1 ): 

(1) Within grades = ai — Ci = 1546 

(2) Between grades = ci — d\ = 61 

(3) Total == ai — di = 1607 
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TABLE 73 

Li-Calculations foe Hi :< r a =* <r 


/. 

n. 

log n. 




n, log 0.' 

17 

13 

1.2553 


498 

2.6972 


17 

18 

1.2553 

... 

364 

2.5611 

. . . 

17 

18 

1.2553 


684 

2.8351 

. . . 

51 

54 

> n, log n 9 

- 67.7862 

1546 

> n, log 6, 

- 145.6812 



8 



a 




- log 54 - * (67.7862) + *(145.6812) - log 1546 
= 1.7324 - 1.2553 + 2.6978 - 3.1892 
= 9.9857 - 10 
Li - .968 


Refer to Nayer’s tables of Li (Table V, Appendix) with k = 3 and degrees of 
freedom / = 17. We have P > .05. Therefore we accept Hi. Assuming that the 
three groups have common variance, we may combine the results. 


Step 5. Analysis of variance to test the hypothesis H 0 :Y a = Y. 
The results are summarized in Table 74. 


TABLE 74 

Analysis of Variance of Final Score for Different Grades 


Source of variation 

D.F. 


Mean square 

F 

Hypothesis 

tested 

Within grades 
Between grades 

51 

2 



iloi 

Accepted 

Total 

53 

1607 



Where F 


mean square of between grades 
mean square of within grades 
Refer to Snedecor’s tables o^ F (Table IV, Appendix) with n\ — 2 and ni = 51. 
We have P > .05. Therefore, we accept the hypothesis H 0 and conclude that there 
are no significant differences among the means of the three grades. 


Part II 

Complete procedures for the analysis of variance and covariance with 
one independent variable. 

Step 1. Calculate the following values (see Part I, Steps 1 and 2): 



c&i2 — C 12 = 4467 
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(W 

^ %\t — ^ ^2t jg — ^22 C 22 — 3272 

d^y 

^ ^ X|, —^ = an — cn = 9255 

(l r ") (2 x “) 

£ (2/uXu) = ^ (F lt Xn) - - L 


18 


= CLu — C 14 — 1211 


G K *) (2 - Y -) 

^ ( y2t%2t ) = ^ ---Jg~^- = ^24 ~ C 24 = 906 

(2 y «) (2 *■•) 

^ (yztXzt) — ^ (Fa^XsO---j-g-^- = dz\ ~ C 34 = 2243 

t t 

From Part I, Step 2 , we get 
^ Vu = 498 

t 

X lA - 364 
* 

2 ylt = 684 

t 

Then, we have 

ry (yuxu )] 2 

LA J _ (1211)2 _ 

_ ~u&r - 6/S * 


(606) 2 _ 251 
3272 261 


Mi = - 

2*“ 

t 

M 2 = ■ 

[£ (yttXit) ] 2 

t 

2 xh 

i 

M t = 

[£ (yuXidJ 

t 

2 xlt 

t 

Define 


Adjusted ^ yj« = ^ 

Y « 


_ (2243) 2 _ 

“ ^25T 544 
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where b, = 






= > ylt - m. 


By simple algebraic operation, we have 
Adjusted ^2/5* = ^ V\t -—f,-= £ 

t t ) *1 * 

t 

Define 

0* = adjusted ^ 

Therefore, we have 
*i = ^ — Mi — 170 

t 

Bl = ^ 2/It — M 2 = 113 

t 

B\ = ^y\t ~ Ms = 140 

t 

Step 2. Use the Za-criterion to test the hypothesis H[:<y v ,. Xt = <r y . x 
The calculations involved are summarized in Table 75. 


TABLE 75 

Li-Calculations for Hi:<r Vt . xt = <r w . x 



8 8 8 


- log 54 - A (67.7862) + A (115.7346) - log 423 
= 1.7324 - 1.2553 + 2.1432 - 2.6263 - 9.9940 - 10 
.\ Li - .986 

Refer to Nayer’s tables of L\ with k =» 3 and degrees of freedom / — 16. We 
have P > .05. Therefore, we accept Hi and combine the results. 

Step 3. Calculate the following values for the analysis of variance of 
y and x and the covariance of yx (with x held constant). The sums of 
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squares and of products for the different sources of variation are (see 
Part I, Step 1): 


(1) Within grades: 


2y 2 = ai — ci = 1546 = A 0 
hx 2 = a 2 — c 2 = 16,994 = B 0 
2yx = a 4 — C4 = 4360 = Do 




2 y 2 = C\ — d\ = 61 = Ai 

(2) Between grades: { 2x 2 = c 2 — <U = 5837 = Bi 
2 yx — Ci — d\ = 594 = D\ 

2y 2 = «i - di = 1607 = A 

(3) Total: { 2x 2 = a 2 - d 2 = 22,831 = B 
Zyx = a 4 — d 4 = 4954 = D 


Step 4. Calculate feZya; for “within” and “total” where bXyx 
(2yx) 2 
2x 2 

Refer to Step 3; we have 


(1) Within grades: bZyx = jr- = 1119 = M 0 

x>o 

(2) Total: bXyx = ^ = 1075 = M 


Step 5. Calculate adjusted 2y 2 for “within” and “total,” and 
reduced 2 t / 2 for “between.” 

(1) Within grades: Adjusted 2y 2 = A 0 — M 0 = 427 = P 0 

(2) Total: Adjusted 2y 2 = A — M = 532 = P 

(3) Between grades: Reduced 2y 2 = P — P 0 = 105 

Step 6. Analysis of variance and covariance to test the hypothesis 
Hl:Y 8 = Y with X held constant. The results are summarized in 
Table 76. 

TABLE 76 

Analysis of Variance and Covariance of Final Score with Mental Age Held 

Constant 


Source of 
variation 

D.F. 

Sjr* 

Sx 2 

2 xy 

Adjusted or reduced 

D.F. 

S.S. 

M.S. 

F 

Hypothesis 

Within grades 
Between grades 

51 

2 

1546 

61 

16,994 

5,837 

4360 

594 

50 

2 

427 

105 


6.148 

Rejected 

Total 

53 


22,831 

4954 

52 

532 





Refer to Sncdecor’s tables of F with ni = 2 and n 2 = 50. We have P < .01. 
Therefore, we reject the hypothesis H 0 l and conclude that there are significant differ¬ 
ences among the means of final scores for these three grades with the effects of mental 
age partialed out. 
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Part III 

Complete procedures for the analysis of variance and covariance with 
two independent variables. 

Step 1. Calculate the following values (see Part I, Steps 1 and 2): 


Y y <?*■)' 

L At ~ Lf Z \£ To “ 013 ~ Ci3 — 534 

t t lo 

V Y <?*)’ 

Lt z \t ~ Lt 7n — 023 C 23 ~ 281 


(W 


18 


— 033 — C 33 — 524 


X z \t — X Z\ t 

t t 

v y (? K “)(? Z “) 

4 (yitZit) = Li ( Y u Zit )- — - = ai6 — C 15 = 492 

Y Y G»G» 

4 (y*t z 2t) = 4 (YuZtt) ---^-= «25 — C 25 = 301 

t t -Lo 

Y Y (2 >'“) (l z ") 

4 (ystZ3t) = 4 (YztZzt) - 7 q-.* - = 035 — C 35 = 582 

t t lo 

Y Y G»Q» 

L, (®it2u) = Li (XuZu) — -7 0 = <li« — Ci« = 1190 


18 

d x -)(l z -) 


= 026 — C 26 — 751 


X (***») = X (XitZzt) -- jg 

Y Y (X* 8 ‘) (X Zst ) 

4 (®a**8i) = 4 (XztZzt) --- 1 - — 036 ~ C 36 = 1961 

£ £ lo 

From Part I, Step 2, and Part II, Step 1, we also have 


2 ylt = 498 

t 

£ ylt - 364 

t 

= 684 

t 

^ xl t = 4467 
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I 

t 

l 


x\ t = 3272 
x\ t = 9255 


2 ( yuxu ) = 


1211 


2 (yuxti) 

t 

2 (ystxzt) 


= 906 
= 2243 


Then, we have 

2 z?e [2 iyuXu) ] 2 - 2 (x u zi<) 2 (yuZit) 2 (yuXu) 
M\ = £ 8 _ •- _£_£_ 

i « t 

_ 534(1211) 2 - 1190(492)(1211) 

4467(534) - (1190) 2 
783,122,214 - 709,016,280 _ 74,105,934 

969,278 969,278 /b 

2 z it [2 (Vux 2 ()] 2 - 2 (x 2 ( Z2<) 2 (2/21*21) 2 ( yttx 2 ,) 
s/i = J_L__*- t - 1 __ 

2 * 2 * “ [2 

t t t 

_ 281(906) 2 - 751 (301) (906) 

3272(281) - (751) 2 

230,654,916 - 204,802,206 _ 25,852,710 _ _ 
355,431 355,431 6 

2 Z»t (2/3^)] * ~ 2 {xztZut) 2 (yztZzt) ^ (ystXat) 
Afi = £ £ _‘_ t _£_ 

2-1 2 4 - [2 
< * < 

_ 524(2243) 2 - 1961 (582)(2243) 

9255(524) - (1961) 2 

_ 2,636,269,676 - 2,559,940,386 _ 76,329,290 _ _ 
1,004,099 1,004,099 

2 [2 (y^J ~ X (xuZu) 2 (yuXu) 2 (yuZu) 


N\ =- 


2 2 ~ [2 


_ 4467(492) 2 - 709,016,280 
969,278 

1,081,299,888 - 709,016,280 372,283,608 00 , 

-959^78-96p78“ ' 384 
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Ni 


(Z2«22<) £ (UnXit) l (V2tZ2t) 
J_<__ V _T_ 

2 x\t 2 4 - [£ (* 2 « 2 «)] 2 

I t t 

_ 3272(301) 2 - 204,802,206 
355,431 

_ 297,431,344 - 204,802,206 _ 92,629,138 _ 

355,431 355,431 Zbl 

J a«,[J (»«*«)]*- £ (Zs«Z3«) ^ (2/3*373*) ^ {VuZzt) 


N\ = JL 


2 X « ^ 4 ~ [J ( X 3 ‘ 2 3‘)] 2 

t t t 

9255(582) 2 - 2,559,940,386 
1,004,099 

3,134,890,620 - 2,559,940,386 _ 574,950,234 

1,004,099 1,004,099 5/d 

Define 

Adjusted £ y 2 t = V (j/, { - - b. 2 z. t ) 2 

t t 

1*1 {y.tx, t ) — ^ (x, t z.t) ^ (y. t z,t) 


where b,i = 


t t 


£ 4 ^ 4 - [£ (x.<z, t )] 2 

I < t 

£ 4 ^ (y,<z, ( ) - ^ (x, t z«) £ (2/,ta;,<) 

^ 4 X z * { ~ ( x *‘ z “)] 2 

t t t 

By troublesome algebraic operation, we have 
Adjusted ^ 4 = ^ 4 - M\ - N\ 

t t 

Define 0" = adjusted ^ 4 


Therefore, we have 

el' = 2 4 - M\ - N[ = 38 

t 

% = J 4 “ - #2 = 30 

t 

O’i - J 4 - M l ~ = 35 
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Step 2. Use the Li-criterion to test the hypothesis H":a yt . xat = <r v . xt . 
The calculations involved are summarized in Table 77. 


TABLE 77 

Li-Calculations for H"\a yvxttt = a v . xt 


/. 

n t 

log n , 

n. log n. 

0 

log e." 

n, log e / 

15 

18 

1.2553 


38 

1.5798 


15 

18 

1.2553 


30 

1.4771 


15 

18 

1.2553 


35 

1.5441 


45 

54 

> n, log n. 

= 67.7862 

103 

y n, log e," = 82.8180 

. 


B 



« 




- log 54 - -A (67.7862) + * (82.8180) - log 106 
= 1.7324 - 1.2553 + 1.5337 - 2.0128 = 9.9980 - 10 
Li = .995 


Refer to Nayer’s tables of Li with k — 3 and degrees of freedom / = 15. We 
have P > .05. Therefore, we accept Hi" and combine the results. 

Step 3. Calculate the necessary values for the analysis of variance of 
x, y , and z and covariance of yx, yz, and xz (with both x and z held 
constant). The sums of squares and of products for the different sources 
of variation are (see Part I, Step 1 and Part II, Step 3): 

Zy 2 = 1546 = A 0 
Zx 2 = 16,994 = Bo 
Xz 2 = a 3 - c 3 = 1339 = Co 
Zyx = 4360 = D 0 
Zyz = a& — cb = 1375 = E 0 
Zxz = a 6 — c 6 = 3942 = F 0 
Zy 2 = 61 = Ai 
Zx 2 = 5837 = Bi 
Zz 2 = c 3 — dz = 84 = Ci 
Zyx = 594 = Di 
Zyz = Cb — db — 71 = E\ 

Zxz = Cq — de = 697 = Fi 

! Zy 2 = 1607 = A 
Zx 2 = 22,831 = B 
Zz 2 = a 3 - d 3 = 1423 = C 
Zyx = 4954 = D 
Zyz = as — d& = 1446 = E 
Zxz = o 6 - do = 4639 = F 


(2) Between grades 


(1) Within grades 


(3) Total 
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Step 4. Calculate bi2yx and bt2yz for “within” and “total,” 
2z 2 ( 2yx) 2 2 xz 2yz 2yx 
2x 2 2z 2 - (2xz) 2 
2x 2 (2yz) 2 2xz 2 yx 2yz 
2x 2 2z 2 - (2xz) 2 

Refer to Step 3. We have 


where bi^yx = 
bt ^ yz = 


(1) Within grades: &i ^ yx 
bt ^ yz 

Total: bi ^ yx 

bt y yz 


CoD 2 0 - 

FoEoDo 

B 0 Co 

-Fl 

BoEl - 

FqDqEq 

BqCo 

-Fl 

CD 2 - 

FED _ 

BC - 

■ F 2 

BE 2 - 

FDE 

BC - 

■ F 2 


= 252 = Ml 


= 1178 = Nl 
154 = M 1 
1323 = N 1 


Step 5. Calculate adjusted 2y 2 for “within” and “total,” and 
reduced 2y 2 for “between.” 

(1) Within grades: adjusted 2y 2 = A 0 — M\ — Nl = 116 = PI 

(2) Total: adjusted 2y 2 = A — M — N = 130 = P 1 

(3) Between grades: reduced 2y 2 = P l — P\ = 14 

Step 6. Analysis of variance and covariance to test the hypothesis 
H'o : Y, = Y with both X and Z held constant. The results are summa¬ 
rized in Table 78. 

TABLE 78 

Analysis of Variance and Covariance of Final Score with Both Mental Age 
and Initial Score Held Constant 


Source of 
variation 


Sy* 

S3 2 

22 2 

Xyx 

Zyz 

T,xz 

Adjusted or reduced 

D.- 

F. 

S.- 

S. 

S. 

F 

Hypoth¬ 

esis 

Within 

grades 

Between 

grades 

51 

2 

1,546 

61 

16,994 

5,837 

1,339 

84 

4,360 

594 

1,375 

71 

3,942 

697 


116 

14 




2.95 

Accepted 

Total 

53 


22,831 

1,423 

4,954 

1,446 

4,639 

51 

130 

BTlfm 


Refer to Snedecor’s tables of F with iti =■ 2 and nt =■ 49. We have P > .05. So 
we accept the hypothesis Ho" and conclude that there arc no significant differences 
among the means of final scores for these three grades with both the effects of mental 
age and initial score partialed out. 

Analysis of Variance in the Case of Unequal or Disproportionate 
Numbers of Observations in the Subclasses. The analysis of variance in 
the case of a single criterion of classification with unequal numbers in the 
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subclasses introduces no new difficulty as has been indicated in Problem 
XI.2. However, when our data have been classified on the basis of two 
or more criteria with unequal subclass numbers, new difficulties arise. 

In agriculture and in other experimental sciences it is usually possible 
to design an experiment so that each subclass has always the same number 
of individuals. If this were a necessary condition, the use of the powerful 
tool of analysis of variance would be greatly restricted, since there are 
fields, such as those dealing with human beings—education and psy¬ 
chology, for instance—where unequal representation in each cell of 
multiple classification of data is of common occurrence, both in experi¬ 
mentation and in other observational programs, including data collected 
by governmental and state agencies. There is an urgent need, there¬ 
fore, for a systematic formulation of methods of attacking problems when 
unequal representation in the subclasses occurs. Methods have been 
developed for such problems (Refs. 8, 9, 10). 

Tsao (Ref. 10) has treated the problem of analysis of variance and 
covariance for unequal or disproportionate representation in the sub¬ 
classes by giving the mathematical solution with the specified restric¬ 
tions defined and by proposing new approximate methods with the 
respective statistical assumptions to be fulfilled. Our consideration of 
this problem is limited to the presentation of an approximation method of 
analysis for unequal representation in the subclasses of two classifications. 

Problem XI.6. An approximation method of analysis of variance for 
unequal frequencies in the subclasses of two classifications. We take 
the problem of testing two hypotheses: (1) that the grade means on a 
speed of reading test are equal and (2) that the school means on the 
reading test are equal. The basic data for the fifth, sixth, seventh, and 
eighth grades in each of two schools are given in Table 79, including the 
appropriate notations. The complete analysis of the problem follows. 


TABLE 79 

Calculated Measures for Speed Score in Gates Reading-Survey Test 


School 

Grade 

n.i 

X,i 

s'*i 

x\u - y (x.i, - £.<)* 






t t 



41 

49.68 

12.53 

6280 


' 6 

39 

41.08 

11.28 

4835 

A 

7 

32 

42.41 

9.99 

3094 


8 

36 

63.26 

10.59 

3925 


5 

26 

33.92 

12.47 

3888 


6 

27 

29.22 

11.02 

3157 

B 

7 

34 

32.50 

10.00 

3300 


8 

32 

40.53 

9.84 

3002 
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where s — 1, 2, 3, 4 represent grades 5, 6, 7, and 8, respectively 
i = 1, 2 represent schools A and B, respectively 

t ** 1, 2, • • • , tiij 

n 8 t is the number of observations for the sth grade in the tth school 
X a { is the mean score for the sth grade in the 7th school 

s'«< is the unbiased estimate of standard deviation for the sth grade in the ith 
school 

Xai and s' a * are obtained by the following definitions: 



Step 1. Use criterion L\ to test the hypothesis Hilary = cr. 
Let us define: 

& = 2 “ 2 (X. it - X.,) 2 

t t 

The calculations for Li are summarized in Table 80. 


TABLE 80 

Z/i-CaLCULATIONS FOR = a 


/.< 

n*i 

log Uai 

n„i log n.i 

e'.i 

log e’.i 

n„i log 0'ai 

40 

41 

1.6128 


6,280 

3.7980 


38 

39 

1.5911 


4,835 

3.6844 


31 

32 

1.5051 


3,094 

3.4905 


35 

36 

1.5563 


3,925 

3.5938 


25 

26 

1.4150 


3,888 

3.5897 


26 

27 

1.4314 


3,157 

3.4993 


33 

34 

1.5315 


3,300 

3.5185 


31 

32 

1.5051 


3,002 

3.4774 


N - 267 

^ risi log n,i 

Si 

= 408.0397 

31,481 

2 n.i log e'.i - 959.2015 

si 


The harmonic mean of /«• 


8 


!+!+!+!+!+!+!+! 

40 ^ 38 ^ 31 ^ 35 ^ 25 ^ 26 ^ 33 ^ 31 


8 


.253168 


= 31.60 


log Li = log IV - ^ n,i log Uri + ^ ^ n,i lo 8 6 >i ~ log 2 6,i 

= log 267 - tJt(408.0397) + t£t( 959.2015) - log 31,481 
= 2.4265 - 1.5282 + 3.5936 - 4.4980 = 9.9939 - 10 
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Therefore, Li = .986. Refer to Nayer's tables of Li (Table V, Appendix) 
with k = 8 and degrees of freedom / = 31.60. We find that P > .05. 
Therefore, we accept II\. We may assume that the eight groups have 
a common variance, and combine the results. 

Step 2. Use the x 2 -criterion to test the goodness of fit for the equal 
frequencies in each subclass. First calculate the mean frequency: 

n = ~ = 33.375. The results of the x 2 test are summarized in 

o o 

Table 81. 


TABLE 81 
Calculation of x 2 


/o 

- 1 

/< 

l/o -f‘\ 

(/o-/.)* 

u 

41 

33.375 

7.625 

58.140625 

1.7420 

39 

33.375 

5.625 

31.640625 

0.9480 

32 

33.375 

1.375 

1.890625 

0.0566 

36 

33.375 

2.625 

6.890625 

0.2065 

26 

33.375 

7.375 

54.390625 

1.6297 

27 

33.375 

6.375 

40.640625 

1.2177 

34 

33.375 

0.625 

0.390625 

0.0117 

32 

33.375 

1.375 

1.890625 

0.0566 

267 

267.000 


xo 2 - 5.8688 


We find 

X§ = 5.8688 

Refer to x 2 -table with d.f. = 7. We find .70 > P > .50. There¬ 
fore, we conclude that for our data the class numbers do not differ 
significantly. It is justifiable to use the approximation method. 

Step 3. Convert Table 79 into a table with equal frequency of 33.375 
for each subclass. 

Retaining the original estimate of the standard deviation, we have the 
following estimates of ^ x 2 9it : 

t 

2*?h = ^^(6280) = 5112 

2 *!« = ^- 5 (4835) = 4138 

2^^(3094) = 3227 

2 = ( 3925 ) = 3639 


X x 2it = (3888) = 4991 

2 x\ u = (3157) = 3902 

2 (3300) = 3239 

2 xh t = (3002) = 3131 
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These results, together with the data in Table 79, are summarized 
in Table 82. The notations are the same as in Table 79. 


TABLE 82 

Expected Measures for Speed Score in Gates Reading Survey 


School 

Grade 

(1) 

n 

(2) 

(3) 

S'.i 

(4) 

£*** 

< 

(5) 


5 

33.375 

49.68 


5,112 

A 

6 

33.375 

41.08 


4,138 


7 

33.375 

42.41 


3,227 


8 

33.375 

53.25 

10.59 

3,639 


5 

33.375 

33.92 

12.47 

4,991 


6 

33.375 

29.22 

11.02 

3,902 

B 

7 

33.375 

c2.50 

10.00 

3,239 


8 

33.375 

40.53 

9.84 

3,131 


Step 4. Calculate the different kinds of mean scores. At least 
6 decimal places should be carried out, if possible. The different kinds 
of mean scores are given in Table 83. 

TABLE 83 

Different Kinds of Mean Scores 


SS i\ N «^ 

1 

2 

3 

4 


1 

49.68 

41.08 

42.41 

53.25 

46.605 

2 

33.92 

29.22 

32.50 

40.53 

34.0425 

X.. 

41.800 

35.150 

37.455 

46.890 

40.32375 = X.. 


Step 5. Calculate the following values: 

a = NX*. - 267(40.32375)* = 434,143, where N = 8 n 
c = 2n£ X*. = 66.75[(41.800)* + • • • + (46.890)*] = 439,503 

9 

d - 4n £ - 133.5[(46.605)* + (34.0425)*] = 444,678 

i 

e = n ^ - 33.375[(49.68)* + • • • + (40.53)*] = 450,324 

9 i 

Step 6. Calculate the sum of squares for the different sources of 
variation: 
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(1) Within subclasses: £££**« = (5112+ • • • +3131) =31,379 

sit 

[Refer to Table 82, column (5)] 

(2) Interactions: » ^ £ XI ~ 2 n,Xi -4 n £ + NX*. = e-c 

si i 

— d + a — 286 

(3) Between grades: 2n ^ X 2 . — NX 2 ,. — c — a = 5360 

s 

(4) Between schools: 4 n ^ X 2 ti — AX?. = d — a = 10,535 

(5) Total ( 1 ) + (2) + (3) + ( 4 ) = 47,560 

Step 7. Analysis of variance to test different hypotheses. First, we 
wish to test the hypothesis IIiiXu — Zi 2 = Xn — X 22 = — X 32 

= X 41 — X 42 ; or that there is no interaction between grade and school. 
The results are summarized in Table 84. It is noted that if we have p 
grades and q schools, then the degrees of freedom for each source of 
variation are as follows: 


Within subclasses 
Interaction 
Between grades 
Between schools 

Total 


N — pq 

(j p - DC® - 1) 
v - 1 

g - 1 

N - 1 


The additive property of degrees of freedom is clearly demonstrated. 
From the results in Table 84, we may accept the hypothesis that the 
interaction is not significantly different from zero. Therefore, we may 
pool the sum of squares due to “interaction '^ 9 with “within^ sum of 
squares, as well as the degrees of freedom. We may call this sum “resid¬ 
ual”; it can be used as the basis of testing the other hypothesis. (Note: 
If the interaction is significant, we do not pool it with “within.”) Next, 
we wish to test the other two hypotheses, namely, H' 0 :X\ = X 2 = Xz 
= X 4 and H":X. 1 = X. 2 . The first hypothesis is that there is no 
difference between the four grade means. The second hypothesis is 
that there is no difference between the two school means. The results 
are summarized in Table 85. 


TABLE 84 

Results of Testing the Hypothesis Hi 


Source of variation 

D.F. 

S.S. 

M.S. 

Hypothesis 

Within subclasses 

259 

31,379 

121.15 


Interaction 

3 

286 

95.33 

Accepted 
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TABLE 85 

Analysis of Variance for Speed Score in Gates Reading Survey 


Source of variation 

D.F. 

S.S. 

M.S. 

F 

Hypothesis 

Residual 

262 

31,665 

120.86 



Between grades 

3 

5,360 

1,786.66 

14.78 

Rejected 

Between schools 

1 

10,535 

10,535.00 

87.17 

Rejected 

Total 

266 


From results in Table 85, we reject both the hypotheses 7/J and H' Q '. 
Therefore, we conclude that there are significant differences between the 
means of the grades and that there is also a significant difference between 
the means of the schools. 

Problems 

1. Is there a significant difference among the means of reaction times for 
age and for sex? 

Reaction Times in Seconds to Light and Sound of Various Age Groups (4-60 

Years) according to Sex 



Male 

Female 

Age 


Light 

Sound 


Light 

Sound 

group 

N 





xr 










iv 







Mean 

S.D.* 

Mean 

S.D. 


Mean 

S.D.* 

Mean 

S.D. 

A 

10 

.34 

.1070 

.34 

.0928 

10 

.62 

.1644 

.59 

.1890 

B 

10 

.24 

.0400 

.23 

.0409 

10 

.32 

.0340 

.31 

.0407 

C 

10 

.22 

.0331 

.19 

.0338 

10 

.26 

.0192 

.20 

.0736 

D 

10 

.26 

.0465 

.24 

.0141 

10 

.34 

.0378 

.30 

.1139 

E 

10 

.27 

.0266 

.25 

.0467 

10 

.36 

.0342 

.30 

.0372 

F 

10 

.38 

.0574 

.37 

.0806 

10 

.44 

.0721 

.42 

.0842 


* Standard deviation, Pearsonian. 


2 . Give a complete analysis of variance for the following data: 

Reported Tests with Stanford Achievement Test Battery in 1924 (Data 

from Baldwin) 



Age 

Number of 
cases 

Mean 

Unbiased S.D. 


9 

100 

27.4 

1 

Boys 

10 

117 

37.9 

fBBMMfe -I W 


11 

96 

44.2 



9 


29.1 

■ 

Girls 

10 


38.3 



11 

mm 

44.2 

11.04 
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3. Test the significance of the difference between the means of students in 
arithmetic computation in the different types of schools, Grade 4. 


Arithmetic Computation Scores by Type of School 
(After Peterson, 1948) 


Score 

interval 

Frequency 

Total 

Boarding 

Day 

Mission 

Non-res. 

Public 

55-59 

0 

1 

0 

0 

0 

1 

50-54 

1 

1 

0 

0 

0 

2 

45-49 

4 

3 

0 

1 

1 

9 

40-44 

4 

10 

1 

0 

15 

30 

35-39 

41 

60 

29 

8 

90 

228 

30-34 

84 

146 

48 

17 

231 

526 

25-29 

80 

148 

31 

17 

222 

498 

20-24 

69 

166 

30 

16 

129 

410 

15-19 

75 

165 

24 

12 

123 

399 

10-14 

48 

130 

11 

8 

62 

259 

5- 9 

37 

98 

8 

6 

47 

196 

0- 4 

11 

36 

3 

5 

19 

74 

Total 

454 

964 

185 

90 

939 

2632 


4. The data on the following page were obtained from the administration 
of two tests to a random sample of 132 students in a class in college 
biology. Test 1 was designed to measure the acquisition of funda¬ 
mental facts and principles; Test 2, to measure the ability to apply 
a knowledge of facts and principles. 

Problem: Test the linearity of regression of scores in Test 2 on scores 
in Test 1. 
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Student ! 
No. 

Score on 

Student 

No. 

Score on 

Student 

No. 

Score on 

Test 1 

Test 2 

Test 1 

Test 2 

Test 1 

Test 2 

1 

63 

34 

45 

63 

34 

89 

53 

27 

2 

71 

42 

46 

83 

44 

90 

49 

22 

3 

70 

41 

47 

80 

52 

91 

90 

49 

4 

119 

50 

48 

89 

49 

92 

69 

31 

5 

109 

57 

49 

98 

44 

93 

52 

37 

6 

75 

30 

50 

73 

35 

94 

40 

41 

7 

88 

33 

51 

65 

30 

95 

82 

40 

8 

83 

55 

52 

62 

30 

96 

90 

37 

9 

68 

20 

53 

114 

54 

97 

108 

54 

10 

59 

35 

54 

105 

39 

98 

83 

40 

11 

55 

43 

55 

88 

35 

99 

98 

37 . 

12 

106 

47 

56 

78 

49 

100 

61 

18 

13 

56 

35 

57 

69 

51 

101 

80 

39 

14 

81 

51 

58 

67 

36 

102 

70 

40 

15 

102 

48 

59 

79 

29 

103 

60 

30 

16 

94 

43 

60 

80 

38 

104 

66 

34 

17 

97 

40 

61 

47 

36 

105 

71 

31 

18 

84 

39 

62 

68 

42 

106 

85 

46 

19 

91 

51 

63 

93 

44 

107 

43 

26 

20 

85 

41 

64 

78 

37 

108 

65 

32 

21 

106 

49 

65 

51 

34 

109 

53 

35 

22 

86 

49 

66 

92 

46 

110 

88 

45 

23 

104 

41 

67 

76 

36 

111 

68 

41 

24 

78 

40 

68 

105 

57 

112 

93 

46 

25 

91 

51 

69 

55 

32 

113 

91 

47 

26 

82 

43 

70 

86 

50 

114 

101 

56 

27 

64 

34 

71 

71 

30 

115 

94 

40 

28 

55 

38 

72 

70 

31 

116 

91 

41 

29 

87 

40 

73 

68 

28 

117 

73 

33 

30 

50 

30 

74 

81 

39 

118 

99 

47 

31 

75 

46 

75 

81 

48 

119 

99 

45 

32 

73 

41 

76 

65 

39 

120 

66 

40 

33 

59 

43 

77 

104 

49 

121 

78 

40 

34 

91 

48 

78 

88 

43 

122 

56 

37 

35 

80 

52 

79 

78 

32 

123 

93 

48 

36 

105 

59 

80 

84 

40 

124 

85 

38 

37 

97 

48 

81 

92 

47 

125 

58 

36 

38 

77 

39 

82 

84 

35 

126 

92 

43 

39 

124 

52 

83 

78 

48 

127 

75 

31 

40 

68 

34 

84 

66 

25 

128 

66 

27 

41 

101 

49 

85 

94 

53 

129 

69 

44 

42 

81 

34 

86 

52 

39 

130 

111 

50 

43 

69 

44 

87 

61 

38 

131 

73 

35 

44 

73 

40 

88 

96 

43 

132 

73 

41 
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6 . Analyze the following data obtained for Indian students in the 
twelfth grade showing scores on an arithmetic test and the number of 
schools attended (after Peterson, 1948): 


Arithmetic 
comp, score 

Number of schools attended 

1 

2 

3 

4 

Over 4 

95-100 

0 

1 

0 

0 

0 

90- 94 

0 

0 

0 

0 

0 

85- 89 

0 

0 

0 

0 

0 

80- 84 

0 

0 

0 

0 

0 

75- 79 

2 

2 

2 

4 

0 

70- 74 

18 

16 

13 

16 

9 

65- 69 

12 

39 

20 

21 

20 

60- 64 

15 

56 

30 

21 

8 

55- 59 

14 

48 

23 

18 

18 

50- 54 

8 

46 

22 

13 

13 

45- 49 

2 

30 

21 

11 

8 

40- 44 

2 

16 

19 

13 

6 

35- 39 

3 

17 

9 

5 

3 

30- 34 

1 

5 

7 

7 

0 

25- 29 

0 

7 

3 

0 

0 

20- 24 

0 

3 

1 

0 

0 

15- 19 

1 

2 

1 

0 

0 

10- 14 

0 

0 

0 

0 

0 

5- 9 

0 

1 

1 

1 

0 

0- 4 

0 

1 

0 

0 

0 

Total 

78 

290 

172 

130 

85 


6 . Test the significance of the difference between the means on the achieve¬ 
ment test of the experimental and control groups after adjustment has 
been made for any inequalities in the two groups with respect to pretest 
and I.Q. scores. The data on the following pages derive from an 
experiment to evaluate the effectiveness of the school excursion in teach¬ 
ing a unit on Communication in the sixth grade in eight elementary 
schools (Clark, 1938). 
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Primary Data for Schools Comprising Control Groups 


Individual 

C.A. 

1 M.A. 

Pretest 

score 

Final test 
score 

1 

1.3 

117 

23 

31 

2 

146 

146 

34 

50 

3 

148 

129 

40 

44 

4 

142 

142 

41 

51 

5 

152 

137 

32 

39 

6 

143 

138 

4b 

53 

7 

141 

140 

29 

38 

8 

157 

145 

37 

39 

9 

139 

142 

32 

47 

10 

141 

152 

32 

49 

11 

145 

158 

47 

57 

12 

143 

144 

38 

47 

13 

146 

158 

39 

50 

14 

144 

130 

28 

44 

15 

143 

169 

36 

49 

16 

147 

155 

39 

48 

17 

143 

158 

51 

58 

18 

148 

124 

15 

26 

19 

151 

149 

46 

53 

20 

138 

172 

51 

56 

21 

140 

147 

34 

50 

22 

143 

146 

35 

40 

23 

146 

140 

29 

37 

24 

161 

147 

24 

31 

25 

142 

156 

39 

46 

26 

145 

171 

44 

58 

27 

147 

141 

32 

49 

28 

134 

145 

34 

52 

29 

151 

141 

33 

45 

30 

146 

148 

39 

43 

31 

153 

161 

29 

42 

32 

146 

141 

29 

42 

33 

175 

142 

28 

48 

34 

144 

145 

34 

39 

35 

149 

144 

32 

51 

36 

150 

127 

24 

33 

37 

144 

125 

22 

29 

38 

158 

134 

24 

40 

39 

134 

157 

55 

56 

40 

142 

149 

27 

37 

41 

140 

140 

41 

52 

42 

145 

146 

43 

46 

43 

146 

134 

37 

46 

44 

145 

152 

38 

45 

45 

144 

150 

54 

59 

46 

157 

147 

35 

45 

47 

154 

145 

36 

52 

48 

139 

125 

16 

33 

49 

145 

135 

29 

38 

50 

143 

150 

37 

48 
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Primary Data for Schools Comprising Control Groups ( Continued ) 


Individual 

C.A. 

M.A. 

Pretest 

score 

Final test 
score 

51 

171 

124 

11 

25 

52 

151 

154 

35 

39 

53 

163 

149 

36 

44 

54 

146 

148 

25 

39 

55 

139 

155 

35 

50 

56 

149 

137 

27 

35 

57 

142 

128 

14 

25 

58 

90 

147 

33 

36 

59 

140 

154 

27 

38 

60 

141 

143 

28 

40 

61 

146 

128 

8 

23 

62 

146 

146 

40 

56 

63 

148 

152 

49 

59 

64 

142 

136 

29 

44 

65 

160 

140 

35 

40 

66 

142 

154 

23 

40 

67 

145 

157 

41 

55 

68 

146 

159 

43 

57 

69 

143 

136 

36 

48 

70 

146 

139 

40 

37 

71 

143 

145 

21 

38 

72 

144 

134 

40 

47 

73 

145 

161 

37 

45 

74 

146 

146 

27 

34 

75 

145 

148 

38 

51 

76 

155 

160 

29 

42 

77 

143 

145 

32 

49 

78 

156 

126 

17 

31 

79 

139 

142 

23 

34 

80 

146 

152 

30 

43 

81 

141 

166 

26 

38 

82 

134 

155 

33 

44 

83 

142 

146 

34 

39 

84 

137 

151 

35 

46 

85 

140 

142 

26 

42 

86 

138 

161 

37 

48 

87 

150 

142 

28 

38 

88 

143 

155 

38 

41 

89 

143 

137 

19 

34 

90 

143 

151 

27 

40 

91 

143 

150 

25 

40 

92 

143 

146 

42 

51 

93 

162 

131 

34 

51 

94 

154 

143 

43 

57 

95 

158 

132 

38 

48 

96 

144 

149 

42 

55 

97 

148 

138 

35 

45 

98 

145 

149 

49 

67 

99 

145 

147 

40 

52 

100 

141 

167 

56 

62 
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Primary Data fob Schools Comprising Control Groups ( Continued ) 


Individual 

C.A. 

M.A. 

Pretest 

score 

Final test 
score 

101 

142 

146 

32 

49 

102 

141 

144 

49 

62 

103 

145 

141 

37 

54 

104 

144 

141 

46 

61 

105 

139 

132 

28 

35 

106 

143 

139 

38 

56 

107 

143 

150 

40 

53 

108 

143 

149 

36 

50 

109 

142 

135 

26 

33 

110 

144 

145 

24 

43 

111 

148 

157 

45 

52 

112 

139 

147 

42 

52 

113 

151 

124 

35 

47 

114 

141 

129 

49 

50 

115 

150 

134 

38 

39 

116 

145 

142 

42 

53 

117 

147 

141 

41 

56 

118 

142 

142 

38 

47 

119 

162 

151 

34 

48 

120 

184 

134 

23 

44 

121 

151 

126 

51 

64 

122 

140 

138 

37 

47 

123 

141 

141 

33 

44 

124 

148 

134 

22 

32 

125 

145 

164 

47 

58 

126 

164 

126 

26 

38 

127 

141 

147 

42 

58 

128 

144 

152 

36 

34 

129 

149 

137 

24 

42 

130 

140 

157 

47 

61 

131 

133 

I 

121 

38 

52 
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Primary Data for Schools Comprising Experimental Groups 


Individual 

C.A. 

M.A. 

Pretest 

score 

Final test 
score 

1 

145 

145 

29 

48 

2 

155 

154 

41 

51 

3 

137 

159 

47 

62 

4 

138 

148 

41 

54 

5 

142 

156 

44 

62 

6 

148 

169 

41 

57 

7 

148 

163 

49 

62 

8 

144 

146 

52 

65 

9 

147 

150 

47 

59 

10 

145 

149 

40 

54 

11 

146 

137 

41 

51 

12 

140 

142 

42 

55 

13 

147 

146 

41 

51 

14 

146 

153 

44 

57 

15 

143 

154 

27 

42 

16 

140 

153 

29 

44 

17 

142 

140 

29 

42 

18 

142 

156 

39 

49 

19 

139 

152 

41 

56 

20 

142 

149 

37 

50 

21 

141 

133 

24 

39 

22 

138 

151 

35 

51 

23 

144 

142 

26 

45 

24 

134 

151 

38 

50 

25 

142 

154 

43 

58 

26 

143 

138 

28 

50 

27 

141 

144 

35 

55 

28 

141 

151 

32 

53 

29 

146 

153 

32 

42 

30 

137 

150 

47 

57 

31 

135 

158 

38 

52 

32 

137 

163 

44 

53 

33 

137 

160 

45 

60 

34 

148 

143 

28 

45 

35 

150 

142 

38 

49 

36 

140 

156 

52 

63 

37 

127 

174 

45 

59 

38 

141 

143 

36 

57 

39 

143 

155 

41 

51 

40 

139 

159 

45 

58 

41 

148 

142 

35 

49 

42 

146 

137 

39 

52 

43 

145 

146 

39 

50 

44 

138 

146 

44 

57 

45 

140 

140 

36 

53 
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Primary Data for Schools Comprising Experimental Groups (< Continued ) 


Individual 

C.A. 

M.A. 

Pretest 

score 

Final test 
score 

46 

141 

149 

36 

48 

47 

153 

150 

27 

46 

48 

140 

156 

33 

40 

49 

145 

166 

44 

63 

50 

137 

169 

20 

45 

51 

146 

159 

38 

52 

52 

145 

130 

30 

44 

53 

140 

159 

32 

45 

54 

141 

155 

43 

57 

55 

156 

140 

31 

52 

56 

136 

149 

37 

58 

57 

140 

152 

33 

57 

58 

143 

138 

30 

44 

59 

145 

145 

32 

46 

60 

145 

140 

38 

55 

61 

140 

160 

50 

68 

62 

146 

122 

23 

41 

63 

140 

147 

36 

50 

64 

139 

162 

47 

59 

65 

147 

143 

37 

52 

66 

143 

147 

42 

61 

67 

141 

137 

34 

46 

68 

145 

143 

36 

49 

69 

137 

142 

34 

50 

70 

157 

120 

17 

31 

71 

139 

152 

41 

48 

72 

146 

141 

25 

43 

73 

146 

137 

18 

29 

74 

139 

164 

39 

55 

75 

129 

136 

36 

46 

76 

145 

163 

40 

52 

77 

139 

151 

15 

26 
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7. Analyze the data in Problem 6, using the Johnson-Neyman technique 
and setting up the region of significance if it exists. Contrast this 
technique with that of analysis of variance and covariance (see Ref. 6). 

8 . How can the analysis of variance technique be used in problems of 
estimation, that is, in the detection and estimation of components of 
random variation associated with a composite population? (See 
Ref. 1.) 
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CHAPTER XII 


THE PRINCIPLES OF EXPERIMENTATION 

There is an increasingly general realization that a formal experiment 
is an exacting enterprise designed and carried through with meticulous 
care to answer a few definite questions. The ability to formulate pro¬ 
ductive hypotheses and to design experiments to test them is the mark 
of a first-rate research worker or scientist. An understanding of the 
principles underlying modern designs is essential at every stage of an 
experiment if the primary data are to be collected in such a way as to 
provide the basis for valid inference and so as to enable the maximum 
amount of information to be elicited from them most efficiently. Perhaps 
a clearer grasp of the requirements underlying sound experimentation 
can be gained by the scientific reader through studying and examining 
designs that lead to valid conclusions. He should apply the techniques 
to actual problems, however, since difficulties usually tend to disappear 
on such closer experience. 

The whole subject of complex experiments is undergoing rapid devel¬ 
opment as new possibilities of the methods and of their correct application 
become better understood. The principles of experimentation, which 
originated in agriculture, are finding increasing application in many 
fields of science. The difficulties met with in application in one field 
are not identical with those in other fields, but many are similar. The 
solutions of problems arrived at in one field are often of material help in 
another. Where fields differ fundamentally, new techniques are neces¬ 
sary. Such needs are discovered only in direct contact with the obstacles 
themselves. Because modifications and extensions of the principles of 
design are capable of, and will undoubtedly have, ever wider application, 
the student of modern methods and statistical analysis needs to know 
how to apply these principles and how to read intelligently the reports of 
research workers who have used them. 

Modern ideas of experimental design differ sharply from earlier or 
traditional ones. It has long been an admonition in philosophical 
treatises of scientific experiment to hold constant all except one of the 
factors in a complex so that its effect may be determined. The experi¬ 
menter is advised to arrange an experiment so as to make it as sensitive 
as possible with respect to one question but as insensitive as possible with 
respect to all others. Just as mathematical development has been 
biased toward physics, so has the direction of experimentation been 

276 



Chap. XII] PRINCIPLES OF EXPERIMENTATION 


277 


largely determined by the pattern of experimentation set up in physics, 
which emphasizes the importance of varying the essential conditions 
only one at a time. The difficulty in applying such a principle, par¬ 
ticularly in branches of science where the data of the research worker 
are subject to all sorts of fluctuations, had long been recognized by 
critical workers. The liberation of the research worker from stereotyped 
experimentation is relatively recent. 

The problem underlying the development of procedures appropriate 
to deal with types of variable material is twofold; one aspect dealing with 
the design or logical structure of the experiment, the other with the 
analysis and interpretation of the results. The development of the 
logical structure underlying the whole technique of modern experimental 
design and of the appropriate statistical tools for the analysis and interpre¬ 
tation of the results of such experiments is largely due to R. A. Fisher. 
Beginning his work in 1919 with the founding of the statistical laboratory 
at Rothamsted (Harpenden, England), Professor Fisher has revolution¬ 
ized the science of statistics and the principles of designing biological 
experiments. His principles of experimentation and methods of statis¬ 
tical analysis are finding increasing application in many fields of science, 
particularly wherever the basic materials are variable. The possibilities 
of applying these principles also to the improvement of physical and 
chemical experimentation have barely been recognized. In biophysics 
and biochemistry these principles are likely to become increasingly 
important. 

The subject of the design of experiments is too large and too impor¬ 
tant to scientific workers for it to receive incidental treatment only. In 
his text The Design of Experiments Fisher presents the framework of 
scientific inference and the principles of modern experimentation. Our 
discussion is limited to a brief consideration of the major characteristics 
of modern experimental designs. We are especially interested in the 
role which statistical procedures play in serving the requirements of 
sound experimental design and in furnishing the means for unambiguous 
interpretation. 

The Self-contained Experiment. A principle of general utility in 
statistical analysis is to rely upon the evidence from the data themselves 
when allowances are to be made for certain inequalities, as in certain 
comparisons under consideration. Arbitrary corrections based on an 
a priori basis without reference to the information provided by the data 
themselves cannot lead to convincing conclusions. Violations of statis¬ 
tical principles of this kind, though not so obvious a misuse of statistical 
analysis as is an arbitrary selection among observational data previous 
or subsequent to collection, are probably the source of the political prin¬ 
ciple that “ anything can be proved by statistics,” or of the crescendo 
“lies, damned lies, statistics.” 
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Fisher sets up the self-contained experiment as the model for the 
research worker and describes the properties which such a model must 
possess. Although progress in science may result from the better order¬ 
ing of the experiences we have had, it is chiefly in the collection of new 
experiences that advancement takes place. However, if these experiences 
are to afford a secure basis for bringing new knowledge into being, they 
must be planned in advance in accordance with principles that make 
such outcomes possible. Thus, experimental observations are essentially 
experiences formulated at the time of arranging for their collection. 
Experimental observations are related to existing bodies of scientific 
knowledge as new observations are carried out to test theories growing 
out of the previous collection of data. Theories in turn become modified 
and reformulated as an outcome of the new observations. But once 
an experiment has been designed and executed, its interpretation must be 
based on its own evidence. The purpose, therefore, of making an experi¬ 
ment self-contained is to make possible the valid and unequivocal inter¬ 
pretation of its results without referring for decision or settlement or 
consideration to other experiments or to the aggregate of experiences of 
prior collection. The principle that an experiment should be self-con¬ 
tained determines the essential difference between mere statistical 
observations and those which are collected in accordance with a clearly 
conceived plan. 

The Function of Controls. A primary requisite of the principle that 
an experiment should be self-contained is the necessity of supplying a 
control or controls, that is, the need to base all conclusions concerning 
the differential effect of two or more contrasting treatments on the 
differences in the response or reaction of two or more similar bodies of 
experimental material. By the use of controls, experiments become 
comparative and not merely absolute. Absolute information is usually 
of little interest or importance. The reasoned explanation of the func¬ 
tion of controls is clearly illustrated by the following example (Ref. 2). 

Assume that an experimenter working with animals injected some 
fluid into 3 rabbits and found that all 3 got violent and prolonged con¬ 
vulsions followed by death within an interval of 24 hours. In support 
of his conclusion that the injected substance was the cause of the death of 
the animals, the experimenter might draw from his own previous experi¬ 
ences or from those of rabbit breeders in general. Admittedly, only 
rarely would three designated animals die in the way described within 
such a short period of time. How would the conclusion have been 
made stronger if the experimenter had taken the precaution to inject 
a number of control rabbits with a neutral substance at the same time 
at which he injected his experimental animals? The answer to this 
question provides the rationale underlying the use of controls. It is 
that the controls are used to exclude, at a designated level of probability, 
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a number of alternative interpretations of the experimental results— 
possibilities which have individually and collectively an unknown 
probability of having occurred. For example, the rabbits might have 
been ill from tetanus, hydrophobia, cholera, or some other unsuspected 
epidemic disease; perhaps the needle was infected with a poisonous 
substance; or it might be that the experimenters stock was genetically 
of a kind which reacted in this way in general to injections. Suppose, 
however, that the experimental rabbits had been randomly chosen from 
the whole herd, the controls included. Then, if their reaction was 
clearly different from that of the controls, there was available a precise 
measure of probability for causes other than the experimental factor 
for having brought about the observed result. The probability is based 
exclusively on the number of rabbits used, completely independent of all 
prior experience of these animals. Assume, for instance, that 5 control 
rabbits have been selected at random from the total number and, after 
having been injected with distilled water, had not died of convulsions. 
The measure of probability is obtainable from a simple application of 
permutations and combinations. 

There are 56 ways of choosing a group of 3 objects out of 8. If the 
3 objects were to be selected consecutively, there would be successively 
8 , 7, and 6 objects to choose from and, therefore, the succession of 
choices could be made 8 X 7 X 6 , or 336, ways. This number repre¬ 
sents not only every possible set of 3 but also every possible set in every 
possible order. Three objects can be arranged in order in 3 X 2 X 1, or 
6 , ways. The number of possible choices is found by dividing 336 by 6, 
which is 56. The result, 56, is essential for the interpretation of the 
experimental results. The 56 sets of 3 which might be chosen would be 
distributed among the possible events as follows: 

Number 


Dying f 

0 10 

1 30 

2 15 

3 J. 

Total.56 


The probability of the observed difference, if it were not attributable 
to the material injected, is, therefore, 1 in 56, or a probability level of 
.018, which by the usual standards may be regarded as significant. It is 
also worth noting that the use of the controls serves to transform the 
quality of the experimental evidence by making it strictly objective for 
others who have not undergone the experiences of the experimenter. 

The weight of previous or outside evidence is even much less when the 
object of the experiment is quantitative, because such evidence is usually 
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very indefinite or highly variable. Thus, the essential condition for 
controlling the interpretation of experimental results is the provision of 
comparisons between two or more unlike variants. 

The Valid Estimate of Experimental Errors. The second requirement 
of a self-contained experiment is that it must hold within itself the 
possibility of securing a valid estimate of the experimental errors which 
really influence the comparisons made. That is, it is necessary to esti¬ 
mate the error from the data of the experiment itself, because it is only 
under such conditions that proper confidence can be put in the result of 
the experiment. In any experiment there are factors which are suscep¬ 
tible to some degree of control by the experimenter. But their effect 
cannot be entirely eliminated, owing to chance fluctuations. Many of 
the factors giving rise to these fluctuations which affect performance 
are small in size and random in incidence, so that it is impossible to 
present an exhaustive list of all the sources of variation in the experi¬ 
mental material. It is customary to designate the component of varia¬ 
tion associated with the random variation of the experimental material 
as experimental error. The errors do not follow any known exact laws, 
and so the laws of chance are usually designated as descriptive of their 
distribution. 

As was pointed out in the discussion of analysis of variance, it is 
assumed that the experimental errors to which the experimental observa¬ 
tions are subject shall be independently and normally distributed with 
the same variance. The importance of the experiment making possible 
a valid estimate of the experimental errors is indicated by the fact that 
only under such conditions is it possible to apply to the experimental 
results tests of their significance which are disconnected from all past 
experience and are hence capable of adding new knowledge. Therefore, 
the design of a self-contained experiment involves the consideration of 
means of affording a valid estimate of error as well as ways of making 
possible an unbiased comparison between contrasted treatments. The 
validity of other estimates of error would depend on other mathematical 
assumptions which the particular method of estimation would introduce. 
There would be no objective reason for accepting such assumptions as 
true, if the experimenter has not taken the precautions needed to make 
them true. 

Replication. The first requirement of an experiment designed so 
that a valid test of significance may be applied in its interpretation is 
replication , the process of repeating the same treatment on more than one 
object of the experimental test. The word “plot” is used in agricultural 
experimentation to indicate an individual plot or area of land. The 
“plot” could be an experimental animal or an individual, for instance. 
Replication is essential in the first place since it is a means of diminishing 
the experimental error. Just how this is done may become clear by 
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considering, first, certain factors contributing to the actual errors of the 
experiment. The amo u nt ^ infnrmnfinn. Tirhiph n 
affords is known as its precision^ Fisher succeeded in quantifying the 
concept of information so that now the precision is wholly a quantitative 
factor in the value of an experiment. 

There are a number of factors, both quantitative and qualitative, 
which may contribute to make the actual errors small. Some of these are 
the measurement of the criterion; the improvement in the techniques of 
controlling nonexperimental factors; care in ensuring that in the experi¬ 
mental material the general conditions are those occurring in population 
practice; the measurement of controls under as nearly as possible the 
same conditions as those for the unknowns, including time; and the great¬ 
est possible avoidance of hidden systematic errors as well as subjective 
errors. Only when sufficient care has been given to ensure that working 
errors have been reduced to unimportant quantities can improvement 
of the replication and the organization of the structure or arrangement 
of the experiment be expected to achieve greatly increased precision or 
sensitiveness. The process of reducing working errors begins with 
reducing the largest sources of error, and it continues until sources of 
error that hitherto seemed inconsequential become significant by limiting 
the value of the whole enterprise. 

The second function of replication in an experiment is to provide 
the data from which the appropriate estimate of experimental error can 
be calculated. Thus replication performs the double service of reducing 
experimental error and of furnishing an estimate of the error that remains. 
Replication is the sole source of the estimate of error. To make certain 
that the estimate of error is unbiased requires as much attention in the 
design of an experiment as does the guarantee that any of the direct 
estimates are without bias. Furthermore, the unbiased estimate of 
experimental error is fundamental for the application of valid tests of 
significance by which the value and significance of the experiment are 
determined. Likewise, an unbiased estimate of error is a necessary 
condition if one is to assess the weight that may be given to the evidence 
of an experiment should its results differ from those of other experiments 
of the same sort. 

Since the accuracy of an experiment as represented by the standard 
error of a mean of any one treatment increases in proportion to the square 
root of the number of replications, it is clearly indicated that a larger 
difference in treatments would be necessary to demonstrate the sig¬ 
nificant effect based on a smaller than on a larger number of replications. 

The argument is sometimes advanced that the results are good enough 
if there is reason to believe that the estimate of error is at least not an 
underestimate. Fisher points out that the danger of the fallacy of 
assuming to be “on the safe side” is that there is no security in admitting 
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a bias in either direction. The effect of overestimating the error may be 
to prevent, the experimenter from drawing a conclusion which the experi¬ 
ment justly substantiates. Such a practice could lead to the belief that 
an effect is consequential when it is not, and so to ignore the real cause 
of disturbance in the design of subsequent experiments. Because of the 
exploratory and tentative character of much research, a promising line 
of inquiry might be given up through a failure to discern the clue which 
the experiment might otherwise have provided. 

Randomization . It is essential in an experiment to recognize that 
equalization is approximate to a greater or lesser degree, no matter how 
much care and experimental skill are exerted in attempting to equalize 
the nonexperimental conditions which are likely to influence the result. 
In many significant practical situations the attempts at equalization are 
definitely inadequate. It becomes of fundamental importance that thi$ 
inequality shall not lead to biased estimates and invalid tests of sig¬ 
nificance. The essential safeguard is included in the experimental 
procedure by a process which is known as randomization . Just how this 
operation works can be explained by considering again the origin of error. 
The real errors of the experimental results originate from differences 
in the nonequalization of the nonexperimental conditions among the 
objects or groups of objects that are treated differently. The estimates 
of error are secured from the discrepancies among the objects treated 
alike. Consequently, it is necessary only to make certain that any two 
objects that may be treated alike have the same probability of being so 
treated. Likewise, if treated differently, the objects must have the same 
probability of being so treated, in each of the ways in which this is pos¬ 
sible. This precaution is necessary to assure that each component of 
error which may influence the experimental results may with equal 
frequency furnish the data used in the estimate of error. The calculus of 
probability and the mechanism of the statistical theory of sampling dis¬ 
tributions can then be applied with confidence. 

Randomization, then, is the procedure of making certain that the 
probabilities of being subjected to like treatment are equal for every 
relevant pair of objects in the experiment. It is worthy of note that the 
object of randomization is not to increase the precision of the experiment 
but only to guarantee that whatever precision the experimental arrange¬ 
ment is capable of providing is neither over- nor underestimated. Sys¬ 
tematic arrangements of plots or objects in contrast to random 
arrangement have been shown to give consistently either an over- or an 
underestimate of error. 

Controls, replication, and randomization have been discussed as 
the essential aspects of the principle that an experiment should be 
self-contained. 

Relationship between Experimental Design and Statistical Analysis. 
The relation between experimental procedure and statistical analysis 
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will now be considered more fully. It is apparent from the discussions 
of experimental design that a substantial number of the ideas or concepts 
are of a statistical nature. In fact, a clear understanding of the sta¬ 
tistical procedures used is an essential part of the understanding of the 
principles of experimentation. These procedures serve to fulfill the 
requirements of intelligible and accurate experimental design and to 
provide the machinery of unequivocal interpretation. We note, then, 
that the question of experimental procedure and that of statistical analy¬ 
sis are two aspects of the single problem—the problem of fulfilling the 
requisites of the operations involved in making additions to scientific 
knowledge by experimentation. 

An analysis of the relationship between the two aspects reveals that 
once the practical experimental procedure is established, only one method 
of statistical analysis can be valid. 1 Furthermore, a fact of great practical 
significance is that the validity of the statistical analysis depends upon 
the introduction of a random element in the arrangement of the objects 
of the experiment. I A definite and complete statement of this specific 
process of randomization followed determines in advance the correct 
statistical method to be applied to the experimental results. The logical 
organization of each of the possible types of randomization is set forth 
by the analysis of variance. The neatness of the arrangement of calcu¬ 
lations and of the facility of their interpretation in the analysis-of-vari- 
ance table is greatly appreciated by the modern research worker. The 
compactness and simplicity of this form of summarizing the results as 
well as the logical structure of the experiment have added greatly to the 
intelligibility and accuracy of its interpretation. The logical structure 
of the experiment is shown by the division of the total number of degrees 
of freedom, the independent comparisons, corresponding to each of the 
sum of squares calculated. 

The development of principles improving the art of experimentation 
has been concomitant with that resulting in tools suitable to analysis of 
experimental results. The standardized methods of statistical analysis 
were designed largely on the basis of a mathematical theory in which the 
problems underlying experimental designs of more recent origin had not 
been explicitly considered. \It has been previously pointed out how 
*‘ Student's’’ discovery of the ^-distribution and Fishers extension to the 
z-distribution made exact tests of significance possible, both for small and 
for large samples.) The modern advances in experimental design have 
brought about an increased awareness in practical work of the numerous 
different sources of variation affecting experimental and observational 
material. Exact tests of significance and the technique of the analysis 
of variance are indispensable in the assessment of these various compo¬ 
nents of variation. 

We should not overlook the mathematical framework upon which the 
modern tools of scientific value have been built. This framework gives 
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precision to tests of hypotheses concerning factors giving rise to variation 
and to experiments planned to yield maximum information. 

The statistical treatment of the results of replicated experiments is 
usually established on the assumption of the normal law of error, and the 
general formulation of the analysis is drawn from the method of least 
squares. tit is essential for the correct application of the method of least 
squares that any components of variation not removed by the experi¬ 
mental design be normally and independently distributed. If these 
conditions are not fulfilled, the theoretical basis underlying tests of sig¬ 
nificance breaks down and hence estimates and tests of significance are 
invalidated. Thus, in the test of significance associated with the analysis 
of variance, it was assumed that the measured effects of the factors 
under experiment were statistically independent and normally distributed 
variates, all with the same variance but with possibly different means. 
Unless, therefore, the arrangement of experiments is balanced to fulfill 
the assumptions, the statistical reduction of the data would be very 
difficult, and convincing results would be impossible. Such a balanced 
arrangement is illustrated in Equation (10.04), page 214, where the entire 
calculation is much simplified by the fact that when the equation is 
squared and the terms are summed, the cross-products become zero. 
Another significant property is that the difference between the means 
for any one factor is independent of the other factors. 

The validity of the method of least squares as the basis for the testing 
of hypotheses by experimental results was secured by Fisher through the 
introduction of randomization into the design. It has been pointed 
out that systematic arrangements are apt to lead to biased results, because 
the necessary element of randomization is lacking and hence the test of 
hypotheses through results based on the method of least squares does not 
produce the same objective validity as does a test on experimental 
observations obtained from random arrangements. 

In spite of the fact that the relation between the material conduct 
of an experiment and its statistical interpretation must be used in plan¬ 
ning conclusive experiments, some experimenters continue to work with 
variable material without such design and to obtain discordant results 
incapable of being fitted into a scientific system. Controversies some¬ 
times arise because different experimenters get diverse results for the 
same problem. In other cases, methods of statistical analysis are 
employed which result in definitely misleading estimates of error. Also, 
methods of experimentation are used which cannot give a valid test of 
experimental results. The common procedure of consulting a statistician 
or statistical principles after an experiment or investigation has been 
completed is equivalent to holding a post-mortem analysis. Perhaps 
the only interpretation of the data that can be made is to state from what 
the experiment died. But when research workers turn to sound methods 
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of statistical analysis which involve carefully planned experimental 
designs, difficulties of the type enumerated above tend to disappear. 

Therefore, we can state that the most important work of the statis¬ 
tician is to prepare the plan of the experiment or investigation in such a 
way as to get the best answers to the questions raised. It has been 
demonstrated that a complete overhauling of the process of collecting, or 
of the experimental design, can often increase the precision tenfold or 
twelvefold for the same expenditure in time and labor. The modern 
research worker, therefore, needs statistical knowledge not only for work¬ 
ing out the results but also for designing: unless he has a working knowl¬ 
edge of the technique he employs, he cannot conduct his experiment 
properly. In planning an experiment, it is especially important to give 
due attention to possible results and their interpretation. The experi¬ 
menter must be induced to use his imagination, and to anticipate the 
confusion and difficulties that will assail his investigation if they are not 
foreseen. 
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CHAPTER XIII 

APPLICATIONS OF THE PRINCIPLES OF EXPERIMENTATION 

We now proceed to show the application of the principles of experi¬ 
mentation to certain cases of technical importance. Our emphasis is 
upon the interpretation of the experimental results and the fundamental 
part which statistical methods, particularly those of analysis of variance 
and covariance, play in this process. 

Let us take one of the simplest designs planned to compare the actions 
of two like individuals under contrasting conditions. A biologist might , 
wish to determine the effect of the removal of a deep-seated organ of an 
animal. As a control he would perform a similar operation upon another 
animal of the same kind but in which the organ under investigation would 
not be disturbed. In this way the experimenter attempts to make the 
situations alike in all respects except the factor to be tested. Such 
perfect experimental control is an ideal desideratum which is never 
capable of complete fulfillment. It is, however, a basic principle upon 
which experimentation depends. 

The Single-Factor Experiment. The method of pairing takes into 
account two desiderata in experimental design: (1) The requirement of 
homogeneous experimental material so that the sensitivity of each 
individual observation may be enhanced, and (2) the need for multiplying 
the number of observations in order to reveal the reliability and the 
consistency of the results. The two coupled individuals would, presum¬ 
ably, react alike under the same treatment, and the difference observed 
under contrasting treatment measures the differential treatment effect. 
A minimum of two pairs, or replications, is required, since with a single 
pair it would be impossible to ascribe any difference in behavior detected 
to the difference in treatments or to the particular variability of the 
individuals, or to both jointly. The differences between the measure¬ 
ments of the respective pair members constitute the experimental data 
upon which inferences are to be drawn. Which individual of a particular 
pair shall receive the one or the other of the two treatments is deter¬ 
mined by a random process. If treatments are randomly assigned, 
replication serves to equalize the effect of uncontrolled sources of varia-* 
tion. It is the variation among the several differences that is used in 
estimating experimental error. By comparing the mean difference 
attributable to the differential effect of the treatments with the standard 
error of the mean difference, the significance of the results of the experi¬ 
ment is to be determined. 
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We have previously examined the statistical method for reducing the 
data obtained from an experiment purporting to be of a single-factor 
type (page 75). The difference between the achievement scores of two 
individuals paired on the basis of their potential learning capacities was 
computed for each of the 25 pairs. The null hypothesis was tested that 
these differences constituted a random sampling from a population of 
such differences distributed about a mean of zero in a normal manner. 
The criterion, t, was set up for testing the first aspect of this hypothesis. 

The method of replicated comparison of individuals, by pitting each 
individual against another individual of like kind in conditions made as 
equal as possible, is a simple and effective experimental design for testing 
the differential effect between two treatments. It is, however, limited to 
situations where the presumed effect of a single factor can be measured 
under the controlled conditions prescribed for the validity of the method. 
In practice, these conditions are not often present. Furthermore, it is 
usually desirable to test the effects of more than two treatments. The 
need for broadening the scope and comprehensiveness of experimental 
inquiries has led, therefore, to the extension of replicated comparisons of 
individuals or groups of individuals to more and more complex situations. 
In this extension, the subdivision of the experimental material into rela¬ 
tively homogeneous series is a fundamental part of the process, as was 
observed in the paired experiment. Just as the advance in systematic 
sampling has been made possible by utilizing prior knowledge of the 
population sampled, so the utilization of knowledge of how to subdivide 
the experimental material profitably has played an important part in 
the evolution of experimental design. The principle that the process of 
subdivision can be advantageously duplicated is also operative. The 
smallness of number or quantity of sufficiently homogeneous material 
circumscribes the number of different treatments rather than the number 
of replications that can be incorporated into an experiment. 

The Randomized-Block Arrangement. The experimental design 
known as the randomized block is a simple application of an experimental 
arrangement illustrating the principle of the subdivision of the experi¬ 
mental material into relatively homogeneous series. In this arrangement 
each treatment occurs equally frequently, more commonly once in each 
block, and the treatments are randomly allotted to the experimental 
units within the block. The term “block” may denote any group 
containing the required number of experimental units. In arranging the 
grouping so that similar experimental units are contained in the same 
block, the accuracy of the treatment comparisons is increased by eliminat¬ 
ing from them the differences due to dissimilarities among the different 
blocks. The process of randomization guarantees that no treatment 
bias is introduced and permits an unbiased estimate of experimental 
error basic for the validity of the test of significance. 
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Consider an experiment in nutrition on the relative effect of 4 differ¬ 
ent treatments A, B, C, and 0 (no treatment), which are randomly 
applied to 4 blocks of 4 children each chosen as nearly alike as possible 
with respect to age, height, and weight at the beginning of the experi¬ 
ment. The arrangement is represented in the following diagram: 



Block I 

Block II 

Block III 

Block IV 

Children 

1 

2 3 4 

5 6 7 8 

9 10 11 12 

13 14 15 16 

Treatment 

0 

B C A 

A C 0 B 

B A C 0 

0 C B A 


We give the analysis for the general case where k denotes the number 
of blocks and p the number of treatments. Then the equation for the 
sums of squares is 


| (X - Xy = v | (** -xy + kl (X t - Xy 


(i) 


( 2 ) 


(3) 


pk 

+ £(x - X b - X t + xy (i3.oi) 

(4) 


where X b is the mean of a block, X t is the mean of a treatment, and X 
is the grand mean. The corresponding equation for the degrees of free¬ 
dom is 


pk - 1 = (fc - 1) + (p - 1) + (p - 1 )(k - 1) (13.02) 

(1) (2) (3) (4) 


The following formulas are used to calculate the sums of squares: 

& . P A T 2 

I (X - xy = 2 (X*) - 


(1) Total: 

(2) Blocks: 


(3) Treatments: 


pk 

(where T — grand total for all plots) 

k 


1 ® -p 

(where T& = total for one block) 

(where T t = total for one treatment) 

p* 

(4) Error: ^ (X - ^ - ^ + ^)» = (1) - (2) - (3) 

i 

(subtract blocks and treatments from total) 
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We may further illustrate the principles of the randomized-block 
design by applying them to the experiment presented on page 75. Here 
there are only two treatments, which are assumed to have been randomly 
assigned to the members of the respective 25 pairs. Each pair corre¬ 
sponds to a block, and each individual in a pair or block is an experimental 
unit. The logical structure of this type of experimental design, specified 
by the process of randomization carried out, is sorted out by the analysis 
of variance. In this case each item is classified by two criteria; for 
example, an individual achievement score is classified by treatment and 
membership in a particular pair. The analysis is carried out as follows: 

Step 1. The measures of treatment effects for the respective members 
of the 25 pairs are given in columns (2) and (3) of Table 86. The differ¬ 
ence and the sum of the treatment effects are given in columns (4) and (5). 
The sum and sums of squares are calculated and recorded in the last two 
rows, respectively. 

Step 2. Calculate the sum of squares for differences: 

l (X, - Xl) , - !*gl - *■>]■ - 8962 - , 6809.04 

71 ZD 

This sum of squares is then divided by 2, the number of achievement 
scores involved in each difference. This is done to obtain the per-indi- 
vidual measure of the variation of these differences, since the variance 
of the difference is an estimate of 2<r 2 (see page 37). The quotient of 
6809.04/2 = 3404.52 is entered in Table 87 as interaction or experimental 
error. If the differences among the individuals of the respective pairs 
had been the same, there would have been no interaction. Thus, the 
source of measurement of the experimental error is the uncontrollable 
variation of these differences. 


TABLE 87 

Analysis of Variance of the Achievement Test Scores in Algebra of the 25 

Pairs of Students 


Source of variation 

D.F. 

Sum of 
squares 

Mean square 

F 

Hypothesis 

Interaction or 
experimental error 
Between pairs 

Between treatments 

24 3,404.52 141.855 

24 16,004.92 666.870 4.70 Rejected 

1 1,076.48 1076.480 7.58 Remains in doubt 

Total 

49 20,485.92 


Step 3. Compute the sum of squares from the sums: 
l (X, + X,). - - 200,438 - gfl’. 


32,009.84 


n 
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Here, as in the case of the difference, since the sum is made up of two 
achievement scores, the comparable sum of squares for pair variation is 
£(32,009.83) = 16,004.92. This value is entered in Table 87 as the 
“between” pairs source of variation. 

Step 4. The sum of squares to measure the variation assigned to 
treatment effects is obtained as follows. The mean of the two treatment 
totals is 

£2(Xi + X % ) = £(2052) = 1026 


The two deviations are 1142 — 1026 = 116; and 910 — 1026 = —116. 
The sum of their squares is 26,912. This sum is required on a per-pair 
basis and is therefore divided by 25. The quotient is entered as the 
measure of variation due to treatment in Table 87. 

Step 5. The total sum of squares calculated independently provides 
a check on the calculations. It is given by 


[l (X,)> + 1 (X,)"] 


2 (*i + 2 

2 n 


= 104,700 - 


(2052) 2 
50 


= 20,485.92 


This value is recorded in the total row of Table 87. 

Step 6. The total number of degrees of freedom is 1 less than the 
number of individual achievement scores, or 50 — 1 = 49. The 25 
differences and the 25 sums each contribute 24 degrees of freedom; the 
two treatments, 1. Thus, the additive property applies to the degrees of 
freedom as well as to the sum of squares. 

Step 7. Tests of significance can now be applied to the results 
recorded in the analysis-of-variance table. The differential effect due 
to variation in treatment is found to be F = 1076.48/141.855 = 7.58, a 
value significant at the 5 per cent level. The table values for F corre¬ 
sponding to d.f. 1 and 24 are: F.os = 4.26; F.oi = 7.82. A similar 
finding was given by the /-test (page 78), where Z = 2.75 and Z.oi = 2.797 
for d.f. = 24. This is a demonstration of the fact pointed out on page 
55, that if there is only 1 degree of freedom as in this experiment of two 
treatments, F = Z 2 . Thus, Z 2 = 86.1184/11.3484 = 7.58. 

The test of significance for the differences between the means of the 
pairs is given by F = 666.87/141.855 = 4.7. This value is significant 
at the 1 per cent level; the value of F for d.f/s of 24 and 24 is F. m = 2.66. 

The separation of the source of variation among the pairs illustrates 
the contribution of the experimental design to the precision of the experi¬ 
ment. If this source of variation had not been isolated, the variations 
among the pairs would have been included in the experimental error, 
thus substantially reducing the precision (see Table 88). Thus by using 
the randomized-block design in this case and putting equated individuals 
in each block, the variation among pairs has been controlled and isolated. 



292 


APPLICATIONS OF THE PRINCIPLES [Chap. XIII 


TABLE 88 

Analysis of Variance of the Achievement-Test Scores of the 25 Pairs of 
Students without the Isolation-of-Treatment Effect 


Source of variation 

D.F. 

Sum of 
squares 

Mean 

square 

F 

Hypothesis 

Between pairs 

Within pairs 

24 16,004.92 666.87 3.38 Rejected 

25 4,481.00 179.24 

Total 

49 20,485.92 


An objective basis for determining the increase in precision in using 
randomized blocks as compared with the use of two groups of random 
samples of students for the experimental comparison has been given by 
Yates (Ref. 21). The calculations are as follows: 

The error variance, 141.855, is substituted for the mean square of 
error (24 D.F.) and the mean square for treatment (1 D.F.). The 
corresponding sum of squares is found by multiplying the error variance 
by the combined degrees of freedom. Thus, (141.855) (25) = 3546.375. 
This product is added to the sum of squares for “ between,” 16,004.92. 
Thus, 16,004.92 + 3546.375 = 19,551.295. This sum is then divided 
by the total degrees of freedom, 49. Thus, 19,551.295/49 = 399.005. 
The efficiency of randomized blocks as compared to random sampling 
equals 399.005/141.855 — 2.81 or 281 per cent. 

Symmetrical Incomplete Randomized-Block Design. A useful modifica¬ 
tion of the randomized block type of arrangement is the one known 
as the symmetrical incomplete randomized-block design. In this arrange¬ 
ment each block contains two units only, and all possible combinations of 
the treatments, taken in pairs, are included in the different blocks (Ref. 
21). This type of design has proved to be especially valuable in situa¬ 
tions where the experimental material is naturally divisible into groups, 
with members less than the number of treatments all of which might 
be of experimental interest. The study of several treatment effects on 
such homogeneous groups as twins or triplets is an example. 

The Latin-Square Design. The experimental principle that the 
process of subdivisions of the experimental material may be advan¬ 
tageously duplicated is best illustrated by the arrangement known as the 
Latin square . This type of design is similar in principle to a randomized- 
block arrangement, but in a Latin square two cross-groupings of the 
experimental units are carried out, corresponding to the rows and 
columns of a square. The treatments are subject to the double restric¬ 
tion that each treatment occurs once and once only in each row and in 
each column. Thus, the differences between rows and columns can be 
eliminated from the experimental comparisons. 
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The appropriate process of randomization, necessary to ensure the 
validity of the test of significance applied to the experiment, consists in 
taking any square arrangement which fulfills the conditions of a Latin 
square and rearranging either the rows or the columns, or both, at 
random, and then assigning the treatments at random. The special 
methods which have to be used to assure complete randomization can 
be carried out by using the typical “transformation sets” tabulated by 
Fisher and Yates (Ref. 8). 

The structure of a Latin-square design is illustrated in Figs. 6 and 7 
and the appropriate statistical analysis follows. 

Suit Presented 



Total 15 5 20 10 50 


Figure 6. Record for a single-individual. Figure 7. A 4 X 4 Latin square. 

Consider an experiment designed to test the telepathic powers of a 
large sample of individuals. Suppose that the experiment consists in 
presenting 50 playing cards in sequence, each card being drawn at random 
from the pack and then returned. Each subject reports his guess of the 
suit of the card drawn each time. Figure 6 is the record of a single 
individual. His score of correct assignments is the total of the frequen¬ 
cies in the diagonal cells, for example 12. No 2 cells of a set in the con¬ 
tingency table are in the same row or column and no cell is common in 
2 sets. The sets may be defined by the letters of a Latin square as in 
Fig. 7. 

More generally, let the letters A, B, C, D represent treatments in the 
4X4 Latin square. The 4 ‘plots” are arranged in 4 rows and 4 columns 
and there must be as many treatments as there are rows and columns. 
The treatments are randomly assigned to the plots subject to the double 
restriction that the treatment can occur only once in any row or column. 

We give the analysis for the general case where n represents the num¬ 
ber of rows, columns, and treatments. The equations for the sums of 
squares and degrees of freedom are as follows: 
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(13.05) 


f 2 (** - *)* =»2 & - + n t ~ 

iZ ii—l r—1 c—1 

(1) n (2) (3) 

+ » 2 & - *) 4 + 2 £ &<i - Xr- Xc- X t + 2 xy 

t mm 1 t-1 i-1 

(4) (5) 

where X r and X c represent the means of rows and columns, respectively; 
X t is the mean of a treatment; and X^ is the value of the item in the ith 
row and the jth column. 

The corresponding equation for the degrees of freedom is 

(»* - 1) - (n - 1) + (n - 1) + (n - 1) + (n - 2)(» - 1) (13.06) 

(1) (2) (3) (4) (5) 


The calculations for the sums of squares are as follows: 

rf 2 

_ IA-; - 

1 1 


(1) Totals: 

(2) Rows: 


n n n n m 

l l (X, - i)‘ = ll <x*> - L 


»1 a - *>• -1 


r-1 


(3) Columns: n £ (X c — X) 2 = £ 

C «1 1 


(4) Treatments: n ^ (X t — X) 2 = -- 


(T 7 = grand total of all plots) 
(T 2 ) _ r 2 

ft ft 2 

(7V = total for one row) 

(T 2 ) T 2 
n n 2 

(TV = total for one column) 

(T 2 ) r2 




( T t = total for one treatment) 


(5) Error: £ £ (X« - - X. - X t + 2X)* = (1) - (2) - (3) - (4) 

t-1/-1 

The standard error in a Latin square is 


£ £ (X, 7 - X r - X c - X, + 2X) S 

_ . I «-i1 _ 

(n — 2)(n — 1) 

The standard error for the mean of one treatment is 


(13.07) 



(13.08) 
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Analysis of Variance of the Latin Square 


Source of 
variation 

D.F. 

Sums of 
squares 

Mean square 

Variance ratio 


(n - 1) 

(2) 

(2) 



n — 1 

r i 

Columns. 

(n - 1) 

(3) 

(3) 

(n - 1) 

Ft 

Treatments. . 

(n - 1) 

(4) 

(4) 

(n - 1) 

F, 


(n - 2 )(n - 1) 

(5) 

(5) 


-terror. 

(» - 2)(» - 1) 


Total 

(n» - 1) 

(1) 

(1) 

(» 2 - 1) 



A Greco-Latin Square. A Greco-Latin square is formed by a pair of 
Latin squares—one written with Latin, the other with Greek, letters— 
which, when superimposed, possess the property that each Latin letter 
occurs once in each row and in each column, and each Greek letter 
appears once in each row and in each column, and with each Latin letter 
(Ref. 7). Thus: 


Aa 

B/3 

C 7 

B 7 

Cck 

A/? 

C/3 

At 

B« 


The two squares are orthogonal to each other. Orthogonality is that 
property of an experimental design which makes possible the direct 
and separate estimates of each of the several effects. From analytical 
geometry it is recalled, for instance, that two planes, 

ax + by + cz + d = 0 and a'x + b'y + c'z + d' = 0 

are orthogonal (perpendicular) if aa ' + W + cc f = 0. The principle of 
orthogonality is a basic one in modern experimental designs. 

A Latin-Square Design in Psychology . Although the Latin square 
was originally designed in agricultural experimentation to eliminate from 
the experimental comparisons possible differences in soil fertility among 
plots in rows and in columns, it has found useful application in other fields. 
It is especially advantageous when the disturbing effects of two factors 
need to be eliminated from the experimental comparisons. In experi¬ 
ments in psychology, for example, the effect of the sequence or order 
of the experimental factors or situations in space or in time may need 
elimination. 

Thus, in an experiment (Ref. 9) the object was to find out the effect 
upon recognition of colors when they were presented to the dark-adapted 
eye of the subject under different degrees of illumination. The following 





















296 


APPLICATIONS OF THE PRINCIPLES [Chap. XIII 


analysis-of-variance table reveals the skeleton of the experimental design 
and the corresponding divisions of sums of squares and mean squares 
into the several sources of variation: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Among orders of presentation (rows). 

Among illumination levels (columns). 

Among colors. 

3 Ss o 

3 2* 

3 Ss. ~ 

„ „ Ss. 

6 Ss. T • 

Experimental error. 

>5 bsr 

Total 


mean square (color) 
mean square (error) 


There were 4 colors—yellow, green, blue, and red—and 4 levels of 
illumination. Each color was presented once in the first, second, third, 
and fourth order of presentation at each of the four levels of illumination, 
and once in the first, second, third, and fourth place in each series order. 
The colors were arranged at random with this double restriction. It is 
to be noted that, since all treatments are equally represented in all rows 
and all columns, no part of treatment differences is included in the row 
and column comparisons. Thus, the effects of order and illumination 
levels were removed from the measurement of accuracy of recognition 
of colors. The measurement consisted in the percentage of the experi¬ 
mental subjects who identified each color correctly. In order to apply 
the analysis of variance to the percentages, a prior transformation 
of the data was necessary (see page 165). 

An extension of the experiment to determine the effect of the form 
of the stimulus would require the measurement of the combined effect 
of color and form. For this purpose the Greco-Latin square could be 
used, in which each color and each form would be combined so that one 
and only one combination of each color form occurs. Such color-form 
combinations would then be handled as a Latin square. 

Factorial Design. A formal experiment is designed and executed 
with meticulous care to provide answers to definite questions. The worth 
of the experiment is contingent on how wisely the questions have been 
conceived and formulated. It is fundamental to understand thoroughly 
the purpose and ultimate applicability of the experiment. A big advan¬ 
tage for complex experiments, that is, those designed to secure answers 
to a number of definite questions, lies in the fact that they afford results 
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of wider applicability than do simple ones. Until recently, it was 
regarded as essential that an experiment should be simple and restricted 
to answering a single question regarding the effect of a single factor. It 
is important in setting forth the plans of an experiment to answer the 
questions which prompted the research, to list all the variables that might 
conceivably influence the results. Due attention must be given to the 
possible results and their interpretation. Even after listing all the 
variables that occur to the experimenter, there are others which are not 
suspected. As many as possible of the variables need to be controlled. 
However, it is usually desired to secure comparisons under a wide range 
of conditions of certain variables. In carrying out comparisons of two 
treatments, for instance, under the same conditions, the relative efficacy 
may be accurately determined under certain fixed conditions. How¬ 
ever, unless these experimental conditions duplicate the practical condi¬ 
tions, the findings of the former may not be applicable at all to the 
latter. An average value of the ratio of the measures of the treatment 
effects over a range of conditions is usually the quantity wanted in prac¬ 
tical application. In experiments based on the assumption of controlling 
all factors except the one under investigation, it is often observed that 
the results change from one experiment to another of the same kind. 
The difficulty or impossibility of controlling or isolating the various 
factors involved in experimentation precluded conclusive results in most 
cases of the traditional “controlled” experiment. Furthermore, as 
pointed out above, it is usually most important to observe the effects of 
factors in as nearly a natural setting as possible. 

The desideratum in experimentation of observing the effects of 
varying all the essential conditions simultaneously rather than one at a 
time attains a substantial realization in the modern methods of design 
devised to cope with this problem. A very considerable advance has 
been brought about by the factorial design in experimentation. In this 
design, all the factors to be examined are varied concurrently in all 
possible combinations. The principal advantages of this type of design 
over the traditional experiment planned to examine a single question, or a 
single factor, consist in its greater efficiency and comprehensiveness. 
This superiority is achieved through the fact that in a factorial experi¬ 
ment, every trial contributes to the answering of every question with 
almost the same precision as though the whole experiment had been given 
over to any one of them. In addition to measuring the effect of each 
of the single factors, the measures of the effects of the interaction of all 
combinations of factors are made with the same precision. The latter 
advantage is especially great, since, with separate single-factor experi¬ 
ments, information could not possibly be deduced concerning the inter¬ 
action of the different factors. 

The investigation of the interactions, though a highly important 
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consideration, frequently was overlooked completely until appropriate 
means for the measurement of these interactions were developed. A 
third distinct advantage of factorial design is that this plan gives results 
of wider applicability than do single experiments, since the exact stand¬ 
ardization of experimental conditions prescribed for the traditional 
experimental design gives information only in respect to a narrowly 
restricted set of conditions. In the factorial design the ingredients may 
be varied, that is, applied at different levels, whereas in the single-factor 
experiments standardization requires that the otjier factors be kept 
constant. Rarely is it possible to achieve the degree of standardization 
required for conclusive results. 

A Factorial Experiment in Psychology . The principles of factorial 
design are illustrated by presenting the design and the analysis of the 
results of an experiment in psychology. 

The psychological experiment 1 consisted in determining the difference 
limen (D.L.) of subjects for weights increasing at constant rates. Seven 
different standard weights— 100, 150, 200, 250, 300, 350, and 400 grams— 
and four different rates of 50, 100, 150, and 200 grams per 30 seconds 
were used. Four men and four women constituted the experimental 
subjects. Two of each sex were normally sighted; two of each sex were 
congenitally blind. Five difference limen values were determined for each 
subject on each of the 28 rate-weight combinations. The order of presenta¬ 
tion of each combination was established in advance by the use of Fisher 
and Yates’s set of random sample numbers. The reality of the subject’s 
response was checked by catch stimuli randomly introduced. The entire 
experiment was repeated on each subject after an interval of one week. 
Thus, there were 280 D.L.-values for each of the eight subjects. The 
experimental arrangement may be called a4X7X2X2X2 factorial 
design, that is, the combination of 4 rates, 7 weights, 2 sights, 2 sexes, 
and 2 dates. 

The mean D.L.-value of five trials for each individual on each of the 
weight-rate combinations for each of the 2 dates was the basis of our 
statistical analysis. Let us designate the notations for the different 
variables. The individuals were classified into two sexes, the male 
being denoted by I and the female by II. Each sex was classified into 
two sights: the normal denoted by A and the congenitally blind by B. 
Each individual tried seven different weights: the weight of 100 grams is 
denoted by 1; of 150, by 2; of 200, by 3; of 250, by 4; of 300, by 5; of 
350, by 6; and of 400, by 7. Each weight is combined with each of the 
four rates: 50 grams per 30 seconds is denoted by a; 100, by 6; 150, by c; 

1 For a detailed description of the experiment, the mathematical solution of the 
problem, and the complete analysis and interpretation of the results, see Ref. 14. 
The assumption underlying the analysis of variance, that experimental errors are 
normally distributed with a common variance, was studied by plotting the errors 
both for totals and subgroups. Within the limitations of the method, the assump¬ 
tions appeared satisfied. 



TABLE 89 

Mean D.L. (Measurement in Grams) of All Individuals on Different Combinations: X xi) kit 
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and 200, by (L Observations of each individual trial were obtained on 
two different dates: the first date is denoted by a, and the second by 0. 
Hence, we have 2X2X7X 4 X 2 = 224 subgroups. Furthermore, 
we have two people for each subgroup, denoted by (1) and (2). Alto¬ 
gether, then, we have 448 measures of D.L.-values (see Table 89). 

Mathematically, each measure is denoted by Xanku, which is the 
score made by the 2th individual of the sth sex and the ith sight for the 
jth weight and the fcth rate on the 2th date. The mathematical expression 
of the D.L.-value of the 2th individual in the sth sex of the ith sight on 
the jth weight of the kth rate at the Zth date is 


X'ijklt = A + B a + Ci + Dj + E k + Fi + I a i + I»j + I ah 

+ I$ij + I§th + I ail + lajk + I ajl + Iaki 
+ Iijk + Iijl + likl + Ijfcl + Iaijh + Iaijl 

4" Iaiki + Iajkl + Iijkl + Itikjl + Zaijklt 


(13.09) 


where the subscripts s, i, j, fc, 2, and 2 refer to sex, sight, weight, rate, date 
of the particular 2th individual, respectively; A is the grand mean of all 
individuals; B , C, D, E } and F are the measures of the main effects with 
respect to their own subscripts; the F s are the measures of interactions 
with respect to their own subscripts; and Zaijku is experimental error. 

The mathematical solution of the problem for securing the maximum 
likelihood estimates of each of the components in (13.09) is the same 
as that used in Chapter XI. In order to save space, we shall simply 
summarize all the results given in Table 90. 

We wish to evaluate the 33 terms (listed below) in order to obtain 
all the sums of squares for the complete analysis of variance. To get 


the value of the term ^ ^ ^ ^ ^ ^ X] ijkU) we simply work out the sums 

a i j k l t 

of squares of all the figures in Table 89. There are two methods in 
evaluating each of the other terms. The first method includes three 
steps: (1) work out the squares for each sum of scores in the appropriate 
table; 2 (2) add the squares; (3) divide by the appropriate number which 
refers to the individual measures involved in each sum of scores. The 
second method also includes three steps: (1) work out the square for each 
mean score in the appropriate table; 2 (2) add these squares; (3) multiply 


2 “Appropriate table” refers to the table set up for securing the sum of scores and 
mean of scores required in each case. Since there were 37 tatles required, they are 
not reproduced here. We shall illustrate the procedure in obtaining the sum of scores 
and the mean of scores for each subgroup. The sum of scores is obtained by adding 
the scores of (1) and (2) of Table 89. Thus: 4.5 + 14.0 « 18.5. The mean score is 

18 5 

obtained by dividing the sum of scores by 2: —® 9.25. Mathematically, the sum 


of scores is denoted 


by ^ Xaijutt 


where 


i 


means the summation of the two indi¬ 


viduals; and the mean score is denoted by Xaim-, which is the mean score of the sth 
sex and the ith sight for the jth weight ana the fttn rate on the 2th date. 
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by the appropriate number which refers to the individual measures 
involved in each mean score (this number will be the same as in the first 
method). We prefer to use the first method in calculation since it is 
more accurate from the viewpoint of significant figures. We use the 
second method, since it is simpler, in the presentation of the formulas. 

By following the working procedure indicated in method 1, the values 
of all the 33 terms for our problem are obtained as follows: 


- 222222 *- 

minify 


■l _ a i j k i t 

0 ~ 2 


= 212111 (X:„a .) = 408,702.49 


2222 ( 22 *-'“)’ 


8 i j k l 


_ a i j k It 


= 42222 (Xl „..) = 402,722.52 


2222 ( 22 *“-“)’ 


a i j k 


— k 1 


= 82222<*V,.)= 342,295.6 


2222 ( 22 *“'“)’ 


a i j l 


a i k l j t 


- 142222a:,. = 395,929.74 


2222 ( 22 *“'“)’ 


8 i k l 


a j k l _ i _ t 


= 42222 = 370,451.44 


2222 ( 22 *-"“)’ 


a j k l 


% j k l 8 t 


= 4 2 2 2 2 (*?,«.) = 348,532.41 


222 ( 222 *“'“)’ 


i j k l 


J _ a i j kit 

dl -ir~ 


= 16 nife..) = 340 i 


052.38 


222 ( 222 *“-“)’ 


J _ 8 % k jit 

-28" 


= 28 = 395,333.63 


222(222M' 


j — j k t 

di 56 


=58222 


(#<..,.) = 334,642.70 


222(222*-'“)’ v V v 

d* = -8 HI (£*.,»..) = 368,014.35 

222(222*“'“)’ ;Vv 

d t = - = 16 HI (XI,.i) = 311,866.98 
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d, = - * ■- > l 


28 


= 28 = 365,342.11 


imm^y 


a k l 


A _ i j k alt 

d 7 -8 


- 8 E X 2 (-£!„,..) - 346,776.80 




i j k 


da = - 1-A- 1 - = 16 2 J 2 = 293,525.33 


16 


» i i 


imm^Y vw 

da = - LJ - J —-= 28 HI (XU. kt .) = 341,930.40 


28 


i k l 




d __ 3 k l ait 

1Q -~- 


= 8^22 = 331,333.72 


3 k l 


iwm*~Y vv 

-= H2 4 4 = 334,243.15 


ei = _LJ- L± 




e 2 


e 3 


8 j i k l t 

32 




8 k i j l t 

56 


1X2111 *~Y 


e\ = 


a _ l i j k t 

112 

minify 


= 32 X X (£?.#...) = 310,616.80 
« i 

= 56 2 2 (■£?.. *..) = 365,094.64 

a k 

= 112 22 (-£?. • .J•) = 308,428.40 


« l 


ei = J-l - !—LJ—L 


32 


= 32 J 2 (*!«...) = 292,577.43 


mim^r 


» j 


„ _ i k a j l t 

Ci -56~ 


= 56 ^ It (*?<. *..) = 341,734.41 


i k 




e 7 = -i-i- ?-> ■ *-* 


112 


= 11222 (■£?<.. i.) = 288,699.87 


i l 


e% 


_ 3 k a i l t 


16 


= 16 X X (Xu jk ..) = 330,141.20 


e# 


ncniM’ 

it a i k t _ 

32 


i * 


= 32 22 = 279, 


i l 


279,282.48 
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imnmn 


„ _ k l a i j t 

610 -5T“ 


= 56 £ 2 (*?. .u.) = 328,018.72 


k l 


mini m‘ v 

/! = -- »-* * 2 ^ 4 < - = 224 4 (X 2 . .) = 308,303.02 

minify 

4 * _ * 8 j k l t 

h 224 

/• _ ; 8 i k l t 

64 


= 224 


2 (A?<....) = 288,579, 


46 


= 64 £ (*?./...> = 


278,678.28 


/4 = -— — * ^2 * - = 112 4 (A?...*..) = 327,841.31 

mini^ 

/ l 8 % ] k t 

B - - 


224 


= 224 




276,885.61 


g = * - ■ —. - L - - = 448X?.= 276,769.37 

y 448 ’ 


Substituting the above values in the appropriate formulas of Table 90, 
we obtain the specific sums of squares necessary for the complete analysis 
of variance. 

We first test the significance of each of the interactions 3 of which 
there are 10 of the first order, 10 of the second, 5 of the third, and 1 of 
the fourth order. It is customary to call the interaction involving 
2 factors an interaction of the first order; one involving 3 factors, 1 of the 
second order, and so on. The test of the significance of these interactions 
is given in Table 91. It is noted that the following interactions were 
significant: ! 

sex X sight X rate sight X rate 

sex X sight sight X weight (doubtful) 

sex X rate 

The significant (including the doubtful) interactions were retained 
as specific components in the analysis-of-variance table. The statisti¬ 
cally non-significant interactions were incorporated in experimental error. 

The complete analysis of variance and the results of the corresponding 
tests of the respective hypotheses are given in Table 92. 

8 When two or more factors are involved such that increases or decreases in one 
(or more) influence increases or decreases in the other(s), or vice versa, interaction is 
said to exist. 
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TABLE 91 

Tests of Significance of Interactions as Sources of Variation 


Source of variation 

D.F. 

Sum of 
squares 

Mean 

square 

F 

Test of 
hypothesis* 

Error. 

224 

34,438 

154 



Sex X sight X weight X rate X date. 

18 

270 

15 


Accepted 

Sex X sight X weight X rate. 

18 

320 

18 


Accepted 

Sex X sight X weight X date. 

6 

379 

63 


Accepted 

Sex X sight X rate X date. 

3 

60 

20 


Accepted 

Sex X weight X rate X date. 

18 

538 

30 


Accepted 

Sight X weight X rate X date. 

18 

205 

11 


Accepted 

Sex X sight X weight. 

6 

1,406 

234 

1.52 

Accepted 

Sex X sight X rate. 

3 

2,216 

739 

4.80 

Rejected * 

Sex X sight X date. 

1 

270 

270 

1.75 

Accepted 

Sex X weight X rate. 

18 

215 

12 


Accepted 

Sex X weight X date. 

6 

637 

106 


Accepted 

Sex X rate X date. 

3 

61 

20 


Accepted 

Sight X weight X rate. 

18 

654 

36 


Accepted 

Sight X weight X date. 

6 

340 

57 

. 

. 

Accepted 

Sight X rate X date. 

3 

14 

5 

■ 

Accepted 

Weight X rate X date. 

18 

527 

29 


Accepted 

Sex X sight. 

1 

14,130 

14,130 

91.75 

Rejected 

Sex X weight. 

6 

405 

68 


Accepted 

Sex X rate. 

3 

5,720 

1,907 

12.38 

Rejected 

Sex X date. 

1 

9 

9 


Accepted 

Sight X weight. 

6 

2,089 

348 

2.26 

Remains in 






doubt 

Sight X rate. 

3 

2,083 

694 

4.51 

Rejected 

Sight X date. 

1 

4 

4 


Accepted 

Weight X rate. 

18 

391 

22 


Accepted 

Weight X date. 

6 

488 

81 


Accepted 

Rate X date. 

3 

61 

#20 


Accepted 







* The hypothesis tested is a null hypothesis concerning the variation in the same 
row. For example, the hypothesis regarding sex X sight X weight X rate X date is 
that there is no significant interaction between sex, sight, weight, rate, and date. 


The tests of significance resulted in the following conclusions: 

significant main effects: sex, sight, weight, and rate 

significant second-order interactions: sex X sight X rate 
significant first-order interactions: sex X sight 

sex X rate 
sight X weight 
sight X rate 

It is worth noting that there was no significant difference between 
dates and that no interaction including date as a component was sig- 
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nificant. This result demonstrates that the observations were consist¬ 
ent among themselves. 


TABLE 92 

Complete Analysis of Variance of D.L.-Values 


Source of variation 


Sum of 
squares 

Mean 

square 

F 

Test of 
hypothesis* 

Residual. 

419 

41,692 

100 



Sex X sight X rate. 

3 

2^216 

739 

7.39 

Rejected 

Sex X sight. 

1 

14,130 

14,130 

141.30 

Rejected 

Sex X rate. 

3 


1,907 

19.07 

Rejected 

Sight X weight. 

6 

2,089 

348 

3.48 

Rejected 

Sight X rate. 

3 

2,083 

694 

6.94 

Rejected 

Sex. 

1 

31,534 

31,534 

315.34 

Rejected 

Sight. 

1 

11,810 

11,810 

118.10 

Rejected 

Weight. 

6 

1,909 

318 

3.18 

Rejected 

Rate. 

3 

51,072 

17,024 

170.24 

Rejected 

Date. 

1 

116 

116 

1.16 

Accepted 

Total. 

447 

164,371 


* The hypothesis tested is a null hypothesis regarding the variation in the same 
row. For example, the hypothesis concerning date is that there is no significant 
difference between the date means. 


From the standpoint of the efficiency of the factorial design in this 
experiment, it can be said that we have tested 26 hypotheses regarding 
interactions and 5 hypotheses concerning main effects. If we had used 
the single-factor plan of experiment, we should have required 56 experi¬ 
ments for testing the main effects of rate; 32, for weight; 112, for sex; 
112, for sight; and 112, for date. We also would have had to repeat 
the f-test for C2 x7x2x2x2 = Ci 24 times. Furthermore, no information 
would be possible concerning the interaction effects. 

The Problem of Prediction . The regression equations of D.L.-values 
on each of the factors and interacting factors, which were found to be 
significant, can be determined. With these equations it is possible to 
compute D.L.-values for any particular value of the independent variable 
within the range of factor levels used in the experiment. 

We shall illustrate the use of orthogonal polynomials for determining 
the regression equation for predicting D.L.-values from weights. 4 

We proceed to work out linear, quadratic, and cubic regression equa¬ 
tions. Only the linear coefficient was found significant here, but the 
methods of calculating the latter two are also given. We shall show the 

b 4 For the regression equations of the other significant factors in this study, see the* 
original article. Ref. 14. Other useful references are 2 and 10, particularly 10, for the 
discussion of tne meaning of the linear, quadratic, and cubic terms. 
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method of separating effects associated with more than one degree of 
freedom into component parts that are mutually orthogonal. Because 
of the latter property, the components may be estimated from the data. 
If in our experiment there is only 1 degree of freedom representing the 
tested variation, for example, sex and date, there can be only a linear 
relation between the two levels of variation and the D.L.-values. If 
there are more than 2 degrees of freedom or more than 3 levels of varia¬ 
tion, then these can be separated into component parts—linear, quadratic, 
cubic, and so on—that are mutually independent. Even when there are 
more than 3 degrees of freedom or more than 4 levels of variation, we 
usually do not calculate terms higher than the cubic. 

We first record the means of the D.L.-values found for each weight 
and transform them as follows: 


W (weight) 

Y (D.L.-value) 

X 

y 

100 

28.7922 

-3 

3.9368 

150 

26.4141 

-2 

1.5587 

200 

25.6344 

- 1 

0.7790 

250 

23.745$ 

0 

1.1100 

300 

23.8297 

1 

- 1.0257 

350 

23.2563 

2 

- 1.5991 

400 

22.3156 

3 

- 2.6397 

W - 250 

? - 24.8554 

2z s - 28 



W - 250 

where s =-^—> y - Y - 24.8554. 

We then refer to the tables of Fisher and Yates on orthogonal poly¬ 
nomials (Ref. 8) for N = 7, which reads: 


ii' 

V 


-3 

5 

-1 


0 

1 

-1 

-3 

1 

0 

-4 

0 

1 

-3 

-1 

2 

0 

- 1 

3 

5 

1 

2J/* - 28 

- 84 

2*3'* - 6 

X, - 1 

X, = 1 

II 

<< 


* Finally, we obtain all the regression equations as follows: 

¥ = Co + cix 


Linear: 


(13.10) 
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where Co — Y = 

24.8554 

, _ 

Cl — A 1 






Quadratic: 


T = c' 0 + Ci* + c 2 * 2 

(13.11) 

where 


c$ = Co - (2a;2)c * 

u 


Cubic: 

7 

= Cq + CiX + C 2 x 2 + C 3 X 8 

(13.12) 

where 


II 

0 

(13.13) 



c' — Cl — 

Co - Co n 

(13.14) 



r - 

Cl - Xi 

(13.15) 



„ = 2 |kx 

^2 y j -/2 ^2 

(13.16) 



, - 

C3 — V £'2 A 3 

(13.17) 


The calculation of the regression coefficients for weights is carried 
out as follows: 


X 

y 


Zi'y 

?2' 

k’y 

fc ' 


-3 

3.9368 

-5 

-11.8104. 

5 

19.6840 

-7 

—S .9368 

-2 

1.5587 


- 3.1174 

0 

0.0000 

1 

1.5587 

-1 

0.7790 

-1 

- 0.7750 

-3 

- 2.3370 

1 

0.7790 

0 

- 1.1100 

0 

0.0000 


4.4400 

0 

0.0000 

1 

- 1.0257 

1 

- 1.0257 

-3 

3.0771 

-7 

1.0257 

2 

- 1.5991 

2 

- 3.1982 

0 

0.0000 

-7 

1.5991 

3 

- 2.5397 

3 

- 7.07.9/ 

5 

- 12.6985 

1 

-2.5397 



2£i' 2 

?h’y 

Sfc' 2 

z&y 

s£ 3 ' 2 




- 28 

-27.5498 

= 84 

= 12.1656 

- 6 

= 1.5140 


X, = 1, Xj = l, x» = J- 


By using Equations (13.13), (13.14), (13.15), (13.16), and (13.17), 
we obtain 

Co = 24.8554 c 2 = .144829 

ci = -.983921 c, = -.042056 

c' 0 = 24.2761 

Hence, the regression equations can be obtained by substituting these 
values into Equations (13.10), (13.11), and (13.12). 

Linear: t = 24.8554 - .983921* (13.18) 

Quadratic: t = 24.2761 - .983921* + .144829* 2 (13.19) 

Cubic: t = 24.2761 - .983921* + .144829* 2 - .042056** (13.20) 

W - 250 , 


where x 


50 
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The test of significance of the components of variation due to weight 
is given in Table 93. 

TABLE 93 

Components op Variation Due to Weight 


Source of variation 

D.F. 

Sum of 
squares 

- 1 

Mean 

square 

F 

Test of 
hypothesis 

Linear. 

1 

1735 

1735 

17.35 

Rejected 

Accepted 

Accepted 

Accepted 

Quadratic. 

1 

113 

113 

1.13 

Cubic. 

1 

24 

24 

Remainder. 

3 

37 

12 


Weights. 

6 

1909 









It is noted from Table 93 that only the linear component is significant: 
Hence, only the linear equation is to be used in prediction. The graph 



100 150 200 250 300 350 400 

Weight (grams) 

Figure'8. Linear regression line of the equation for predicting D. L. values 
from weight values. 

of the linear regression equation for the observed D.L.-values is sketched 
in Fig. 8. 

Factorial Design and Covariance in a Study of Educational Develop - 
ment. We wish to illustrate further application of the principles of 
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factorial design by presenting the results of an investigation of individual 
educational development . 6 An application is also made in this study 
of the method of covariance which served to increase the precision of the 
experiment. The specific design developed for this study was a 2 X 3 
X 3 X 3 factorial type. The factors chosen for study were the 2 sexes, 
3 soholastic standings, 3 individual orders, and three school grades. 

In addition to the introduction of the covariance method for control¬ 
ling variables not controlled or controllable directly by the experimental 
design, this experiment differs from the one in psychology just reported 
in that the type of factorial design is of the kind in which absolute repli¬ 
cation is dispensed with and hidden replication is involved (Ref. 7). 
This type is desirable when large numbers of combinations are tested 
simultaneously without repeated use of each combination. All the 
independent comparisons contained in the experiment are allotted to the 
factors tested and to their interactions. Since there is no independent 
comparison ascribable to pure error, the highest order interactions are 
employed as the basis for measuring the precision of the main com¬ 
parisons. The situation in this study has a very wide occurrence in 
research work. 

The criterion score used as a measure of the stage of educational 
development was based on a composite score comprised of the scores on 
nine separate tests (Ref. 13). The standard scores used—ranging from 
0 to 30 with a mean of 15—were determined from the combined grades, 
that is, the tenth, eleventh, and twelfth grades. There were 18 students 
from each of the 3 grades, all chosen at random from the total number 
enrolled in these grades. The mental-age scores were obtained from the 
administration of a group test of mental ability and were calculated for 
all students as of the same date. All students in the tenth grade were 
of chronological age fifteen; in grade 11 , sixteen; in grade 12 , seven¬ 
teen. Students were classified into one of three scholastic groups—good, 
average, poor—based on their honor-point ratios. Individual order of 
educational development was based on the size of the scores of the 
individuals on the second of the two administrations of the battery of 
tests. The interval between the two administrations was 12 months. 

Let us denote the final score, the initial score, and the mental-age 
score by V, Xi, and X 2 , respectively. Again, the two sexes are denoted 
by I for the male and II for the female; three grades are denoted by A 
for grade 10; B, for grade 11; and C, for grade 12 . The three scholastic 
standings are denoted by 1 for the good, 2 for the average, and 3 for the 
poor; and the three individual orders by a for the first, 0 for the second, 
and 7 for the third. The primary data grouped into the several sub- 


6 For the complete analysis of the experimental results in this investigation, their* 
interpretation, and the mathematical formulation and solution of the problem, read 
Ref. 13. 
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classes in accordance with the notations specified are presented in Table 
94. 

TABLE 94 

Scores fob All Sex X Grade X Scholastic X Individual Combinations 


Sex \*<J 

^- T\* 

Grade 


A 


B 

C 


Y 

X t 

x t 

Y 

X, 

x a 

Y 

X, 

X, 



a 

30 

28 

45 

26 

22 

62 

29 

25 

60 


1 

p 

25 

22 

58 

26 

21 

57 

29 

24 

88 



y 

22 

19 

46 

24 

21 

65 

22 

19 

64 



a 

26 

22 

56 

24 

25 

54 

23 

21 

64 

I 

2 

P 

17 

14 

19 

23 

18 

55 

20 

17 

47 ' 



y 

14 

14 

29 

15 

13 

24 

19 

17 

75 



a 

18 

18 

34 

18 

17 

40 

17 

16 

29 


3 

p 

17 

14 

17 

16 

13 

24 

15 

15 

38 



y 

12 

9 

19 

13 

12 

23 

14 

12 

28 



a 

21 

16 

44 

26 

22 

60 

33 

29 

94 


1 

P 

21 

21 

44 

25 

22 

57 

29 

29 

89 



y 

19 

17 

6 

23 

19 

52 

25 

22 

78 



a 

20 

18 

38 

22 

19 

54 

23 

21 

50 

II 

2 

P 

18 

16 

27 

21 

19 

54 

18 

19 

57 



y 

14 

14 

18 

17 

16 

52 

17 

17 

43 



a 

14 

9 

18 

19 

17 

40 

15 

13 

36 


3 

P 

12 

7 

18 

15 

12 

28 

15 

14 

35 



y 

9 

7 

5 

13 

12 

48 

10 

9 

14 


In our problem, we define: 

Y„j t = the final standard score of the tth individual of the jth scholastic 
standing in the ith grade and the sth sex. 

Xuin = the initial standard score of the tth individual of the jth 
scholastic standing in the tth grade and the sth sex. 

X 2li( , = the mental-age score of the fth individual of the jth standing 
in the fth grade and the sth sex. 

In the above definitions, s = 1, 2; i = 1, 2, 3; j = 1, 2, 3; f = 1, 2, 3. 

We then proceed to obtain all the sum of squares and products 
required for the analysis as shown in Table 95. These are listed below, 
together with the notation for each quantity. We shall illustrate how 
these values are obtained by two examples. 


. Example 1. In order to evaluate the term ^ ^ ^ ^ YJi it , we simply 

• i i t 

refer to Table 94 and work out all the squares of the F-measures. Then 












TABLE 95 

Sum of Squares and of Products for Each Source of Variation 
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we sum these squares and obtain the required value, for example, ai 
= 22,730. 


TABLE 96 

Sum of Scobes for Each Sex X Grade X Scholastic Combination 


\>o. 

Sex 

^__^<3rade 


A 


B 

C 


2 Y 

XXt SX* 

2 Y 

2Xi 

2X t 

2 Y 

2Xi 

2X* 


1 

77 

69 

149 

76 

64 

184 

80 

68 

212 

I 

2 

57 

50 

104 

62 

56 

133 

62 

55 

186 


r» 

O 

47 

41 

70 

47 

42 

87 

46 

43 

95 


1 

61 

54 

94 

74 

63 

169 

87 

80 

261 

II 

2 

52 

48 

83 

60 

54 

160 

58 

57 

150 


3 

35 

23 

41 

47 

41 

116 

40 

36 

85 


Example 2. In order to evaluate the term 


a * J L 


(I z '“") (I r ~') 


we refer to Table 96, then compute all the products of SF and 2Xi in 
the same row. We then add these products and divide by 3 to obtain 
the quantity: ba = 19,786. 

The sum of scores for each sex X grade X scholastic combination 
as given in Table 96 was obtained by adding the scores for a , /?, and y 
as given in Table 94. Thus: 30 + 25 + 22 = 77. 

By following similar procedures as illustrated for Examples 1 and 2, 
we obtain all the values for the 96 terms extending from a x through e 6 . 

Here we shall present the results based on one analysis only: 6 the 
complete analysis of variance and covariance partialing out the effects of 
both initial score and mental age. 

Values required for obtaining the sums of squares and products 
specified in Table 95: 


Ol 


02 


-????* 

-iiiH 


9 i j t 


= 22,730 
= 17,926 


• For a complete analysis see Ref. 13. The examination of the assumptions under¬ 
lying the analysis of variance and covariance led to their acceptance insofar as they 
could be tested. See pages 218-219 and Ref. 1 in Chapter X, and pages 251-260 in 
Chapter XI. 
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‘‘•- IIII **-.- 127 ' 369 

9 i 3 t 

04 = 2 222 - 20,116 

9 i j t 

06 = XX XX = 52,005 

9 % 3 t 

06 = 2 I XX = 46,227 


bn = 2 2 X 

a i 4 

r G^)*l 

t 

3 J 

= 22,348 

o- 

M 

II 

-1X1 • 

'LX * 

3 J 

= 21,565 

bn = 2 2 2 

a j t 

XI F «)’" 

t 

3 J 

= 22,500 

bu = 2 2 2 

% j t 

(I f -") ! 

a 

2 J 

= 22,599 

bn = 2 2 2 

ji i j 

XI*-r 

3 

= 17,565 

* = 222 
ait 

XI *-*)’ 

i 

3 

= 16,939 

bn = 2 2 2 

9 j t 

XI x '-)’ 

i 

3 

= 17,643 

*-222 
* 3 t 

(I *>•»)’ 

a 

2 

= 17,712 

b»x = 2 2 2 

a i 3 

XI XT 
< 

3 

= 122,832 

* = 2 2 X 

ait 

(W 

3 

3 

= 113,596 

b W = 222 
• ; < 

xi x -y 

% 

3 

= 116,146 
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£>34 = 

641 = 

bu — 

bn = 

b u = 

£>61 = 

£>62 = 

£>63 = 

£>64 = 

£>61 = 

£>62 = 

£>«s — 

£>64 = 





= 51,241 

= 48,422 

= 50,734 

= 51,639 

= 45,499 

= 43,010 

- 44,924 

- 45,882 



C 11 = 


= 21,247 
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C 12 


Cl3 


Cu 


CIS 


C l« 


C 2 1 


C 22 


C23 


C 24 


C25 


C26 


C81 


Cs 2 


C 33 


n 

* i 

n 

a t 

ll 

* 3 

ll 

% t 

ll 

3 t 

ll 

a t 

ll 

« 3 

ll 

a t 

ll 

» 3 

ll 

x t 

ll 

3 t 

ll 

a % 

ll 

9 3 

ll 

* t 


1- 

CO 

1 

-11— 

~1X 

to 

-II_ 

Xii^y" 

a t 

-11— 

•IX 

to 

_i 1 _ 

\ii^y 

a x 

xnW 1 

3 t 

\ii^r 

X t 

x 3 

-11- 

-txi 

-1X1 

3. 

to 

_1 1_ 

- 11 — 

-1X1 

X 05 
r 

to 

-11_ 

a i 

1 1 

'•M 

~tx 

X 05 

p 

to 

-1 1_ 

-ii— 

-txi 

-IX 

X 50 
r 

•A 

to 

_11_ 

» 3 

9 


= 22,191 
= 21,447 
= 22,259 

= 21,482 

= 22,461 

= 16,658 

= 17,365 

= 16,774 

= 17,439 

= 16,813 

= 17,564 

= 111,351 

= 114,116 

- 106,019 
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= 18,805 

= 19,620 

= 18,954 

= 19,688 

= 18,994 

= 19,853 

= 47,828 
= 50,166 
= 47,627 
- 50,931 
= 48,212 
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Cb 6 

Cb 1 

Cb 2 

Cb3 

CbA 

dzi 

du 

d\2 

d\z 

d\i 

db\ 

dbi 

dbz 

db\ 


n 

n 

a t 

22 

« 3 

22 

a t 

ll 

i 

l 


GX*®*"') (XX y * 7< ) 

a i _ a % _ 

( 22 *-) ( 22 *-) 


(2 2 *-) (2 2 *-) 


i t 


( 22 *-) ( 22 *-) 


J_ 3 


(2 2 *-) (2 2 *-) 


a t 


= 2 

8 

= 2 

i 

= 2 

3 

= 2 

t 

= 2 

a 

= 2 

i ' 

-2 

3 

-2 


( 222 *-)’ 


18 


= 105,831 


(2 2 2 *-) (2 22 *-) 


* 3 t 


i 3 t 


27 


( 222 *-) ( 222 *--) 


• 3 t 


9 3 t 


18 


(2 2 2 *-) (2 2 2*4 


8 % t 


a i t 


18 


( 222 *-)( 222 *-) 


18 


i » i 


( 222 *-) ( 222 *-) 


i 3 t 


i 3 t 


27 


( 222 *-) ( 222 *-) 


9 3 t 


18 


9 3 t 


( 222 *-) ( 222 *-) 


ait 


a i t 


18 


(2 2 2 *-) (22 2 *-) 


« ♦ ; 


18 


• « i 


50,698 

42,482 

44,368 

42,086 

45,181 

= 18,694 

= 18,741 

= 19,590 

= 18,924 

= 47,097 

= 47,646 

= 50,124 

= 47,596 
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dei 


d« 2 


dee 


-I 

8 

-1 

i 

-l 

3 

-I 




» 3 t 


27 


» 3 t 




8 3 t 


18 


8 3 t 


xm^xm^y 


8 i t 


18 


xm^xm^r 


18 


e 


Cee 


=n 

i t 

-ll 

3 t 

dn = X 

« 

di2 = X 

i 

die = X 

3 

die == X 

t 

d2i = X 

8 

d>22 = X 

i 

d23 = X 

3 

du = X 


GZ*-)GW 


«_; 


Xll x ^)Xll x '->) 



xm^r 


♦ y ( 


18 

xm^y 


< i « 



= 41,625 

= 42,285 

= 44,346 

= 42,059 

42,794 

44,893 
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<m*-y 



ei = 


e 2 = 


es = 


e\ = 


— 


ee = 


18 

(im^y 


a x j t 


54 

mu^y 

• * j t _ 

54 

a i j t _ 

54 


= 104,877 

= 110,645 

= 114,036 

= 21,123 

= 16,503 

= 104,808 




a x j t 


54 


a x j _ t 




a x j t 


54 


a x j t 




a x j t 


a x j t 


54 


= 18,670 

= 47,051 

= 41,588 


The application of the method involves the calculation of the sums of 
squares of the dependent variable and of each of the two independent 
variables, and the sums of products of each of the independent variables 
with the dependent variate to be adjusted and with each other. These 
values are obtained by applying the appropriate formulas in Table 95. 

We first test the significance of the interactions. The complete 
analysis resulting in the tests of significance of the several hypotheses is 
given in Table 97. Since the adjustment for the two concomitant 
variates has been obtained from the error term, 2 degrees of freedom 
ascribed to error have been used in evaluating it. The reduced sum of 
squares assigned to error is divided by the corresponding number of 
degrees of freedom to obtain the mean square (1.41) appropriate to test¬ 
ing the significance of the remaining interactions. No significant inter¬ 
action was found. Therefore, 44 degrees of freedom became available 
for testing the significance of the main effects. 



Test of Significance of Interactions 
(Partialing out the effects of both initial score and M.A.) 


322 APPLICATIONS OF THE PRINCIPLES [Chap. XIII 






































Chap. XIII] APPLICATIONS OF THE PRINCIPLES 


323 












Illustration op Test op Significance with Reduced Zy 
(Partialing out the effects of X\ and X 2 ) 
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The complete analysis of variance 
and covariance of the final scores, 
partialing out the joint effect of initial 
score and mental-age score, is pre¬ 
sented in Table 98. The analysis, 
which has used all the evidence of the 
relevant data, led to the conclusion 
that there was a significant difference 
among the means of the final scores of 
the scholastic groups and of the indi¬ 
vidual orders of development when 
adjustments were made for the differ¬ 
ences in initial and mental-age scores. 
The difference between the adjusted, 
means of the sexes was significant at 
the 5 per cent level. 

The whole procedure of making an 
exact test of significance based on the 
reduced St/ 2 when there are two inde¬ 
pendent variates is illustrated for the 
test of significance for “ grade” in 
Table 99. 7 

Problems 

1. Design an experiment to determine 
the effect of training upon individual 
differences. 

2. Design a factorial experiment for 
determining the effect of practice 
of different levels and kinds upon 
transfer of training. 

3. Design a factorial experiment to 
determine the effect of various 
lengths and frequencies of intervals 
upon learning a fundamental pro¬ 
cess in arithmetic. 

4. Design an educationa experiment 
which makes use of the Latin- 
square arrangement. 

5. Devise a method of comparing the 
efficiency from the use of the follow- 

7 For the detailed solution of the problem 
of estimation, see Ref. 13. 
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ing three different types of experimental design: Assume the experi¬ 
ment is designed to determine if there is a differential effect of three 
different treatments (for example, different dietary treatments on 
school children). Let A, B, and C represent the three treatments; 0, 
the dummy treatment; I, II, III, the three school terms. In Design 1 
the possible diet sequences are given by 1,2,3, . . . , 24. 


Design 1 




123456789 

10 

11 

12 

13 

14 

15 

16 

17 18 19 20 21 22 23 24 

School 

i 

OOOOOOAAA 

A 

A 

A 

B 

B 

B 

B 

B 

B 

C 

C 

C 

C 

C 

A 

term 

ii 

AABBCCOOB 

B 

C 

C 

O 

O 

A 

A 

C 

C 

O 

O 

A 

A 

B 

B 


111 

BCCAABBCC 

O 

O 

B 

A 

C 

C 

O 

O 

A 

A 

B 

B 

O 

O 

A 


Design 2 



12 3 4 

5 6 7 8 

School I 

CO AB 

A C B O 

term II 

CO AB 

A C B O 

III 

CO AB 

A C B O 


In Design 2, the same treatment is administered to the same child 
throughout the three school terms. The treatments are randomized 
in blocks of 4 children, who are selected to be as alike as possible. 

Design 3 




12 3 4 

5 6 7 8 

School 

I 

DBC A 

OCB A 

term 

II 

B AGO 

OC AB 


III 

ACOB 

ACOB 


In Design 3, the treatments within each block of 4 children for each 
term are rerandomized. 

6 . Assume there are 15 persons who are to be invited to 35 dinners, and 
that 3 persons are to take part in each dinner. Arrange the invitations 
for dinner so that each person is invited 7 times, and 2 persons meet 
at a dinner just 1 time. 
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CHAPTER XIV 

MULTIPLE REGRESSION PROBLEMS 


It frequently happens in experimental situations that we are concerned 
with the problem of estimating or predicting one character from a 
knowledge of another or of a number of other characters. For prediction 
or estimation of this kind to be useful, it is necessary that a change in 
the variable to be predicted is accompanied by some corresponding 
change in the other variable or variables. Problems of this kind require 
the quantification of this apparent relationship existing among the vari¬ 
ables and are spoken of as problems in regression . 

In the simple case of the regression of one variate on another, the 
regression function takes the form 

F = a + b(X - X) (14.01) 

where b is the regression coefficient of Y on X, and Y' is the predicted 
value of Y for each value of X. 

The Multiple Regression Equation. If, instead of having only one 
independent variate, such as X in the simple case above, we have meas¬ 
ures on several independent variables, then we can express the mean 
value of the dependent variate, Y , in terms of the several independent 
variates. This is the multivariate case to be treated in this section. 

We denote by Y t the value of the criterion variable, and by X it the 
value of the ith measurement of the tth individual, respectively. Then 1 
the multiple regression equation (or, more accurately, the partial regres¬ 
sion equation) for obtaining the simple weighted sum of the measure¬ 
ments, Y\ may be written 

Y\ = a 0 + bxX it + • • • + b k X kt (14.02) 

where it is assumed that we have the value of the criterion variable, 
Y ty and fc measurements of each individual. In Equation (14.02), a 0 is 
a constant; the Vs are known as the partial regression coefficients and are 
also constants. Instead of the subscript b i, for instance, the subscript 
yi.23, . . . , fc or 0.123, . . . , fc is often used. This subscript indicates 
more completely than &i that the partial regression coefficients show 
how greatly unit changes in the individual X variables affect Y h inde¬ 
pendently and directly. The values of these constants are to be deter¬ 
mined in each case from the available data. 

If we let y t , y\, x\ , x*, . . . , x k represent the deviations from the 
respective means of the variables, there is no need for the term ao in 
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Equation (14.02), because 

2 y' t = Szi = Zx 2 , • • • , = 2x* = 0 
In terms of this notation, Equation (14.02) becomes 

y't — biXi + biXt + • • • + bkXk (14.03) 


In order that y[ be the best linear estimate of y t when “best” is con¬ 
sidered in the light of the least-squares criterion, 2 (y t — y't) must be 
minimized; that is, 

(yt — biXi — b 2 x 2 — • • • — bkXk) 2 (14.04) 


must be minimized. A necessary and sufficient condition for this min¬ 
imum sum of squares is that the b’a satisfy the following system of 
equations: 

2(y — biXi — b t xt — • • • — b k Xk)xi = 0 "1 
2(y — biXi — b 2 x j — • • • — b k x k )x 2 = 0 


(14.05) 


2 (y — biXi — b 2 x 2 — • • • — b k Xk)x k = 0 J 

The left members of these equations are the negative of one-half of the 
partial derivatives of (14.04) with respect to bi, b 2 , ... , 5*. 

Equations (14.05) may be written in the form 


6i2xJ + btZxiXk + • • • + bkZxiXk = 2xi y 
biXxiXt + btXxl + • • * + bk^XiXk — 2 x 2 y 


(14.06) 


biZxiXk + b 2 ~ZXkX 2 + • * • + bkZxuXk = 2 x k y ] 

The set of equations (14.06) are often called normal equations. After 
computing the necessary sums from the given data and substituting these 
values in the system of equations, it is possible to solve for the 6 ’s in order 
to obtain the partial regression coefficient. Substituting these values in 
(14.03), we have the multiple regression equation in deviation form. If, 
as is usually more convenient, the original measures instead of the devia¬ 
tions are used, these values of the 6 ’s may be substituted in Equation 
(14.02). The value of ao = Y — biXi — • • • — bkXk, where the bars 
denote the mean values of the several variates. 

The accuracy with which the regression coefficients or weights enable 
us to predict or estimate the values of the criterion variable is determined 
by computing the multiple correlation coefficient. This may be inter¬ 
preted as the zero order, or total correlation coefficient between the actual 
values of Y t and the values Y[ predicted from the multiple regression 
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equation (14.02). The development of multiple R as a measure of the 
accuracy of prediction of a multiple regression equation may be observed 
as follows: 

N 

X s = £ (r. - Y')* (14.07) 

1 


Let R represent the correlation between the two sets of scores, Y t 
and Y[; and let = 2 (F t — Y t ) 2 = the sum of squares about the 
mean of Y h Equation (14.07) may then be written 

X 2 = 2y 2 ( 1 - R 2 ) (14.08) 

from which it follows that the multiple correlation coefficient, R , is the 
measure of the accuracy with which the criterion scores may be pre¬ 
dicted. It may also be pointed out that the multiple correlation is 
another case of the analysis of variance, that is, of analyzing 2 y\ into two 
parts, one associated with regression and the other a residual. 

The value of R , the multiple correlation coefficient, may be readily 
calculated from the following equation: 

R 2 = fajKgig) + biXfay) + - • * + b k 2(x k y) ^ Qg ^ 

2 t/ 

The normal equations (14.06) may be modified by dividing both 
members of the first equation by V 2x\ • 2 i/ 2 ; both members of the 
second equation by y/lZx\ • 2 y 2 ; and ... of the kth equation by 
V 2 xJ'• 2 y\ 

This modification yields the following system: 


Pi + PlT 12 + * • • + Plk = T\ Y 
P\T 12 + P2 + ‘ * * + Plk = 1"2 Y 


(14.10) 


PtiTlk + Pk + ‘ • * + Pkk = TkY J 

where ft = b , ft = ft yj0, • • • ; and ft = ft The fts 

are known as the standard partial regression coefficients } to distinguish 
them from the 6’s, the partial regression coefficients. The jfl’s are the 
partial regression coefficients for the variates expressed in standard 
measure form, thus rendering them independent of the original units of 
measurement and giving measures of the comparative weight attribut¬ 
able to each of the independent variates. In terms of the P* s, the mul¬ 
tiple correlation coefficient is given as 

/ZJ.123. • = Pi ?ir + P2V2Y + ‘ # # + PhThY 


(14.11) 



330 


MULTIPLE REGRESSION PROBLEMS [Chap. XIV 


A systematic procedure often used for the solution of the system of 
normal equations (14.06) or (14.10) is known as the Doolittle method, 
after its formulator, an engineer with the United States Coast and Geo¬ 
detic Survey. Doolittle, in 1878, introduced a method which was due 
to various improvements over Gauss’s method of solving simultaneous 
linear equations by direct substitutions. Some modifications of Doo¬ 
little’s method have occurred from time to time, but the essential fea¬ 
tures of his method persist (Refs. 8 and 9). This method is applied 
below, but first it is desirable to enumerate what is involved in the com¬ 
plete analysis of a multiple regression problem. 

We have described above the method of setting up the multiple 
regression equation and of calculating the criterion of its predictive 
accuracy, the multiple correlation coefficient. The values of the b’s 
or the /3’s alone, however, give a very incomplete description of the rela¬ 
tionships between the dependent variable, Y, and the independent 
variates,-X - !, . . . , X*. They do not indicate whether all—or, if not all, 
which—of the independent variates are significantly related to the 
dependent variate; nor can the confidence intervals or fiducial limits be 
specified from them within which the true values of the regression coeffi¬ 
cients are to be found. The standard error of the sum or the difference 
between two regression coefficients may be needed. Where no apparent 
relation is found between the dependent variate and one or more of the 
independent variates, it is often desirable to omit such variates from the 
regression equation. It may also at times be desirable to add one or more 
new independent variates to the original battery. Occasionally, there 
may be an interest in the multiple correlation between a certain set of 
independent variates and each of several dependent variates. Finally, 
when predictions for each individual have been made from the multiple 
regression equation, we are interested in the accuracy of each individual 
prediction and in setting up a confidence interval for each individual. 

To facilitate the carrying out of most of the above analysis, Fisher 
(Ref. 13) has suggested the use of a set of auxiliary quantities, C pq 
(p, q = 1, 2, • • • , k). The quantities G v i, C P 2, . . . , C P k are the 
solutions of the set of equations (14.06) with the right-hand side of the 
pth equation replaced by 1, and of the other equations by 0. The rela¬ 
tions between the regression coefficients and the auxiliaries, C’s, are given 
by 

h 

hi - £ C iq ^ (X qv ) (t - 1, 2, • • • , k) (14.12) 

fl-l 

For example, for the case of 3 independent variates the 3 systems of 
equations are obtained by using for the right members of the equations 
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1 , 0, 0 for the first system; 0, 1 , 0 for the second system; and 0, 0, 1 for 
the third system: 

Ai 2 xl + A 2 ^xix 2 + A3SX1X3 = 1 0 0 

Ai'LxiX 2 + A 2 Zx\ + A z 2x 2 x 3 = 0 1 0 (14.13) 

A\^X\Xz + A 2 i:x 2 xz -f* Azlxl = 0 0 1 _ 

The three solutions for these three sets of equations may be written 

Ai = Cn, C12, Cn 

A 2 = C 12 , C 22) C 23 (14.14) 

A 3 = d 13 , C 23, C 33 _ 

Once the 6 values of C are known, then the partial regression coefficients 
may be obtained in any particular case by calculating 2xiy, 2x 2 y y 2x 3 y 
and substituting in the following formulas: 

bi = CnZxiy + Ci 2 ^x 2 y + Cu2x z y 

b 2 = CnXxiy + C 22 ^x 2 y + Cn'Zxiy (14.15) 

bz = Ciz^xiy + C 2 z2x 2 y + Czz^xzy _ 

Problem XIV. 1. The complete analysis of a regression problem. 

We shall illustrate the complete analysis of a regression problem as it was 
carried out in a study of predicting in the School of Agriculture in the 
University of Minnesota. In this problem it was of interest to secure 
the correlation coefficients between the several variates. Furthermore, 
the use of correlation coefficients in the normal equations provides the 
same order of magnitude for all the quantities at any given step in the 
solution. Their use is also advantageous in the use of the check column 
to be described later. The standard partial regression coefficients rather 
than the partial regression coefficients are used because of the interest 
in comparing the relative importance of the independent variates, which 
originally were in different units of measurement. For this case the 
auxiliary set of quantities, the C’s used for securing the Vs have been 
supplanted by what we call the < 7 ’s for securing the jS’s. 

We have observed 213 individuals with 1 dependent variable and 
5 independent variables. Let us denote the dependent variable or the 
criterion by F, and the independent variables by Xi, X 2 , X 3 , X 4 , and X 5 . 
The scores observed are as follows: 

F: honor-point ratios 
Xi: age 

X 2 : Iowa Silent Reading Test score 
X 8 : Otis raw score 
X 4 : previous education in years 
X 6 : School of Agriculture Reading Test total score 
We wish to predict the honor-point ratio from the measures of the 
independent variates. The following steps are pursued: 
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Step 1 . Compute all the intercorrelations and the standard devia¬ 
tions. Let us define: 

. . 2y 2 . = ZXiXj. 2 *^ 

•' N -V v N - V '• Ns#,-’ ,y NsiSy 

where i ^ j and i, j = 1 , • • • , 5. All the measures in our case are 
summarized as follows: 

N = 213; Xi = 15.9296; X % = 161.9061 ;X 3 = 37.8498; X, = 8.9531 
X 6 = 90.0235; Y = 2.3362; s? = 5.34245869; si = 246.33860141 
si = 109.11357981; s\ = 5.26540141; sf = 587.14968357 
s* = .70191238; sis 2 = 36.27745774; Sl s 3 = 24.14404430 
sis 4 = 5.30378969; s : s 5 = 56.00734941; s 2 s 3 = 163.94782712 
S 2 S 4 = 36.01487742; s 2 s 6 = 380.31255769; s 3 s 4 = 23.96928698 
s 3 s 5 = 253.11264376; s<s 6 = 55.60196191; r 12 = .0300 

n 3 = .0983; r u = .1100; r 16 = .0470; r 23 = .7143; r 24 = .0960 

r 23 = .8203; r 3 4 = .1821; r 36 = .7230; r 43 — .1124; r\ v = .1784 

r 2w = .6505; r 3y = .5164; r 4 „ = .0993; r 6# = .6704 1 

Step 2 . Compute Fisher’s auxiliary statistics (#,-,•)’s. The 5 systems 
of simultaneous equations to be solved are 

Right members of system 
(1) (2) (3) (4) (5) 

9 i + Tugt + risgfs + ri 4 0 4 + r 16 fif6 = 1 0 0 0 0 * 

ri20i + 0* + T 33 g 3 + r 24 0 4 + r 2 606 = 0 1 0 0 0 

TnQi + 7**302 + 0s + t* 34 0 4 + r 36 g 6 = 0 0 1 0 0 (14.16) 

ri40i + r *4i7* + nuffs + 04 + r 46 05 = 0 0 0 1 0 

7*i60i + r 2 5^ 2 + r 33 g 3 + ruga + 0s = 0 0 0 0 1. 

The values obtained for the g’s in the first system will be designated 
by pa, 0 2 i, 0 3 i, 04 i, and 06 i. The values obtained for the g’a in the second, 
the third, the fourth, and the fifth systems will be designated by 0 i 2 , 0 22 , 
0 »j, 0 m, and 0 32 ; by 0 i 3 , 0 23 , g 33 , g 33 , and 06 3 ; by gu, 0 24 , g 3 t, gu, and by 064 j by 
015 , 0 * 5 , 0 * 5 , 045 , and 068 , respectively. It is worthy of note that 

gu = gu (i * r, *, i - i» • • •»5) (14.17) 

1 We have used 4 decimal places in our calculations. This is likely a minimum 
number with the number of equations and of unknowns used. As the number of 
equations and unknowns increases, the Doolittle and other similar methods of elimi¬ 
nation require increasingly larger numbers of decimal places. For example, probably 
at least 10 places would be necessary for 10 unknowns if a final answer oi 1-place 
accuracy is wanted (Ref. 17). 
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For our problem, (14.16) becomes 

(I) : l.OOOOpi + .O3OO0 2 + .09830s + .11000s + .04700s 

=1 0 0 00 

(II) : .03000! + I.OOOO 02 + .71430, + .09600, + .82030, 

= 0 1 0 0 0 

(III) : .09830! + .714302 + 1.00000, + .18210, + .72300, 

=0 0 1 00 

(IY): .110001 + .096002 + .18210, + 1.00000, + .11240, 

=0 0 0 10 

(V): .04700, + .820302 + .72300, + .11240, + 1.00000, 

= 0 0 0 0 1 . 

A systematic procedure often used for the solution of such a system 
of equations is shown in Table 100. A convenient check column is often 
carried along, to the right of these computations. The first and second 
entries of this check column are found by adding all other entries in their 
respective rows. The third entry is found in two ways, thus yielding a 
check on the accuracy of the arithmetical computations. The first way 
consists in the addition of all other entries in the third row. The other 
way consists in operating on the first entry in accordance with the direc¬ 
tions given at the left. The other entries in the check column are found 
in a similar way . 2 

The values of 0 ,i, 0 „, 0 „, 0 „, and 0 „ can be read directly from the 
last row, numbered (23): 

0 ,i = —.0024; 0,2 = —2.1493; 0 ,, = —.9678; 0 „ = —.0062; 

0,6 = 3.4638 

We get 04 i, 0 , 2 , 0 , 8 , 0 „, and 0 « as follows: 

Substitute 0 ,i in Eq. (16, Table 100) and use column F in the right- 
hand member: 

0 ,i + .0018(-.0024) = -.0946 
0 ,i = -.0946 

Substitute 06 2 in Eq. (16, Table 100) and use column F' in the right 
member: 

0,2 + .0018(—2.1493) = .0649 
042 “ .0688 

Substitute 0 „ in Eq. (16, Table 100) and use column F" in the right 
member: 

0 ,, + .0018( —.9678) = -.2276 
0 „ = —.2258 

‘ 3 It should be noted that errors occurring in the rounding of the original corre¬ 
lations are not accounted for by the check column. 
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Substitute g 54 in Eq. (16, Table 100 ) and use column F m in the right 
member: 

g 44 + .0018( — .0062) = 1.0456 
044 = 1.0456 

Substitute gs 6 in Eq. (16, Table 100) and use column E (4) in the right 
member: 

045 + .0018(3.4638) = 0 
045 = —.0062 

To obtain 031 , 032 , ^ 33 , < 734 , and 035 : 

Substitute 0 4 i and 0 si in Eq. (10, Table 100 ) and use F in the right 
member: 

03 i + .2176( —.0946) + .2798( — .0024) = -.1589 
031 = -.1377 

Substitute 0 4 2 and 052 in Eq. (10, Table 100) and use F' in the right 
member: 

032 + .2176(.0688) + .2798(-2.1493) = -1.4712 
032 = —.8847 

Substitute 0 43 and 0 53 in Eq. (10, Table 100) and use F" in the right 
member: 

033 + .2176( —.2258) + .2798(-.9678) = 2.0665 
033 = 2.3864 

Substitute 0 4 4 and 0 B 4 in Eq. ( 10 , Table 100 ) and use F m in the right 
member: 

034 + .2176(1.0456) + .2798( —.0062) = 0 
034 = —.2258 

Substitute 0 4 s and 055 in Eq. (10, Table 100) and use F (4) in the right 
member: 

035 + ,2176( —.0062) + .2798(3.4638) = 0 
035 — —.9678 

To obtain 021 , 022 , 023 , < 724 , and 025 • 

Substitute 0 3 i, 04 i, and 0 6 i in Eq. (5, Table 100) and use F in the 
right member: 

02 i + .7119( —.1377) + .0928( — .0946) + .8196(-.0024) = -.0300 
021 — .0788 

Substitute g Z2f 042 , and 0 52 in Eq. (5, Table 100) and qse F' in the right 
member: 

022 + .7119( —.8847) + .0928(.0688) + .8196(-2.1493) = 1.0009 
022 — 3.3859 

Substitute g ZZ) 043 , and 0 6 3 in Eq. (5, Table 100) and use F” in the right 
member: 



\ 336 


MULTIPLE REGRESSION PROBLEMS [Chap. XIV 


gu + .7119(2.3864) + .0928(-.2258) + .8196(-.9678) = 0 
0 ,» = —.8847 

Substitute gu, gu, and 0, 4 in Eq. (5, Table 100) and use F'" in the 
right member: 

g u + .7119(—.2258) + .0928(1.0456) + .8196(-.0062) = 0 
g u = .0688 

Substitute gu, gu, and gu in Eq. (5, Table 100) and use F (i) in the 
right member: 

gu + .7119( —.9678) + .0928(-.0062) + .8196(3.4638) = 0 
gt 5 = 2.1493 

To obtain 0 n, gn, gis, gu, and gw- 

Substitute gu, g» x, 04 i, and 06 i in Eq. ( 1 , Table 100 ) and use F in the 
right member: 

0 n + .0300(.0788) + .0983 (—.1377) + .1100(-.0946) + .0470( - .0024) 

= 1 

0 n = 1.0217 

Substitute gu, gu, 042 , and 052 in Eq. ( 1 , Table 100) and use F' in the 
right member: 

012 + .0300(3.3859) + .0983(-.8847) + .1100(.0688) + .0470(-2.1493) 

= 0 

0 i 2 = .0788 

Substitute 0 2S , 0 3 s, 043 , and 0 63 in Eq. ( 1 , Table 100) and use F" in the 
right member: 

0 i, + .0300(—.8847) + .0983(2.3864) + .1100(-.2258) + .0470( - .9678) 

= 0 

01 , = -.1377 

Substitute 0 2 4 , 0 , 4 , 0 44 , and gu in Eq. ( 1 , Table 100) and use F'" in the 
right member: 

0 i 4 + .0300(.0688) + .0983 ( — .2258) + .1100(1.0456) + .0470( - .0062) 

- 0 

0 i 4 = —.0946 

Substitute gu, gu, gu, and 0 „ in Eq. ( 1 , Table 100 ) and use F w in the 
right member: 

0 i, + .0300(—2.1493) + .0983(—.9678) + .1100(-.0062) 

+ .0470(3.4638) = 0 

0 i, = —.0024 

■ The accuracy of the ( 0 i,)’s (i s* j) can be checked by the equation 
9<f ” 0 /iJ and the accuracy of the ( 0 «)’s can be checked by a method 
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illustrated by Wallace and Snedecor (Ref. 25).* It is shown that to 
obtain gn, the sum of products of the last two members (regardless of 
sign) in each section in column (F) is found; similarly, for <722 the same 
procedure is followed in column ( F '), and so on. In our problem 

gn = 1 + .0300(.0300) + .0769(.1589) + .0905(.0946) + .0007(.0024) 

= 1.0217 

g M = 1.0000(1.0009) + .7119(1.4712) + .0621(.0649) + .6205(2.1493) 

= 3.3859 

g u = 1.0000(2.0665) + .2176(.2275) + .2794(.9678) = 2.3864 
gn = 1.0000(1.0456) + .0018(.0062) = 1.0456 
0 66 = 1.0000(3.4638) = 3.4638 

Since we have checked all our results, we shall summarize them as 


follows: 




gn = 

1.0217; 

0 i 2 = .0788; 

g is = -.1377; 

0i4 = —.0946; 
0 i 6 = —.0024 

021 = 

.0788; 

022 = 3.3859; 

023 = —.8847; 

024 = .0688; 

025 = 2.1493 

031 = 

-.1377; 

032 — —.8847; 

033 = 2.3864; 

034 = —.2258; 
036 — — .9678 

041 = 

-.0946; 

042 — .0688; 

043 = —.2258; 

044 = 1.0456; 

045 = —.0062 

0 B 1 = 

-.0024; 

062 = 2.1493; 

063 = —.9678; 

054 = -.0062; 

065 = 3.4638 


Step 3. Compute Ry.uui, the multiple correlation between Y and 

X u Xu Xu Xu X 6 . 

Define: 

d. = Jar* (*> j = 1, • • • , 5) (14.18) 

j 

where ft is the standard partial regression coefficient. For our problem, 
we have 

ft = gnriy + gitftv + gi&zy + guTiy + gur^, = .1514 

ft = 02iriy + 022^ + 0237*3y + 0247*4y + 02&P6y = .3256 

ft = gZ\T\y + gZ2T2v + gZZTzy — g%ATAy + gZhTby = —.0390 
ft = gAlTiy + 042?* 2y + 043^ + 044^ + 045^ = .0109 

ft = gtlTly + g 62^* 2y + 053^ + 064f*4y + 066f*6y = .4232 

Define again: 

^ 5 . 12*45 = £ PiTiy (* - 1 » • • * > 5 ) ( 14 . 19 ) 


* This method does not provide a complete guarantee of accuracy, since in son\e 
instances large errors in the solution might give only small deviations of the left from 
the right member of the equation, and since some deviations are to be expected when 
only a limited number of decimal places are carried along. 
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For our problem, we have 

jRJ .!2815 = fhriy + 02 r iv + 3 r 3 „ + /S^v + — .50346861 

Therefore, we obtain 

•Ry.12846 = .7096 

Step 4. Test the significance of Ey. 12345 . This can be done through 
the use of the variance ratio. The method is shown below. 


Analysis-of-Vabiance Table 


Source of variation 

Sum of squares 

D.F. 

Mean square 

Not associated with regression 

Associated with regression 

(1 -R 2 )Xy* 

R*Xy 2 

1 

£ g 

1 

(1 - R 2 )Xy* 
N — m — I 
R 2 Xy 2 

m 

Total 

2 y 1 

A - 1 



F (variance ratio) 


R 2 (N - m - 1 ) 
m( 1 - It 2 ) 


For our problem, 


p _ .5035(207) 
* 5(.4965) 


41.98 


(14.20) 


Referring to the F tables (Table IV, Appendix) with ni = 5 and n 2 = 207, 
we have P < . 01 . Therefore, we conclude that the value of the multiple 
correlation is significantly different from zero. 

Step 5. Test the significance of (/3<)’s. 

Define: 

«, - W-i“™ F“ «-l, •••,5) (14.21) 

where is the standard error of ft. For our problem we have: 


„ _ /O- ^5.12345)^11 _ 

- V 1. “ 


8 fit = 


= 


(1 — Ry % 12845)^22 


N — m — 1 


( 1 — fly.i2345)fl»» _ 


N — m 


„ _ /(I ~ ^5.12345)^44 

S(, ‘ “ V N -m-l ~ 

„ _ /(I — •^*. 12 * 45)^66 _ 

* V v-m-r - 


.0495 

.0901 

.0757 

.0501 

.0911 
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The test of significance of each /3 t is given by 

t* = r (14.22) 

Spi 

with N — m — 1 degrees of freedom. For our problem, we obtain 

tp x = 3.059; = 3.614; t Ql = -.515 

tp A = .218; = 4.645 

Referring to the stable with 207 degrees of freedom, we find that (3 h p 2t 
and /3 6 are significant at the 1 per cent level, and that p z and 0 4 are not 
significantly different from zero. Therefore, we can omit the independ¬ 
ent variables X z and X 4 . 4 

Step 6 . The omission of X 3 : Let us denote by and the new 
standard regression coefficient and auxiliary statistics, respectively. By 
mathematical derivations, we have 


(*-1,2, 4, 5) (14.23) 

g 33 

g'a = 9a ~ ^ (i, 3 = 1, 2, 4, 5) (14.24) 

033 

For our problem, we can easily obtain 


01 = 0i - ~0t = .1492; 

033 

0[ = ft - g —0» = .0072; 

033 

0 n - 0 u - j& = 1.0138; 

033 


0i= = .3112 

033 

0l = 0*~lT0* = .4074 


012 “ 012 ~ 


033 
013023 
033 


= .0278 


„ 013043 _ inn/*, / _ 013053 _ AKOO 

0i4 — 0i4 ~ — .107b, 0i5 — 015 ~ ' .0582 

033 

<4 = 0 2S - ^ = 3 . 0579 ; 

033 

0 L = 02 6 -»- 3 = - 2 . 5081 ; 

033 

0 « - 045 - 2^2 = -. 0978 ; 


0 33 

- ,« - » - -.0149 

033 

0« = 044 - ^ - 1.0242 

033 

0 w = 055 - ^ = 3.0713 

033 


Proceeding as before, we have 

Rl. 1246 = /^ly + ^ 2 r 2 z/ + ^4 r 4y + 05 r 6 y = .5028880 
Ry. 1245 — .7091 

R\N - m' - 1) 


F = 


m'(l - P 2 ) 


52.6067 


where m' = 4. Referring to the F-table with n\ = 4 and n 2 = 208, we 
find that P < .01. Therefore, we conclude that the R is significant. 

4 For a discussion of “suppression” variables which might increase the multiple 
correlation even if they correlate zero or near zero with the criterion, see Refs. 14 and 
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In testing the significance of (#)’s, we obtain 


_ /(I -^5.1245)0(1 _ 

** ~ \ N — m' — 1 ~ 


8fi>, = 


(1 RyAUiHljli 


N - m’ - 1 



(1 — fly.ia45)ff44 _ 

N — m' — 1 

(I ^.1246)^56 _ 

N — m 1 — 1 


.0492 

.0855 

.0495 

.0857 


Consequently, we obtain 

=iL 


, - # 


3.033; 

.145; 


tr, = = 3.640 

Sf,’, 

^ - 4.754 


Referring to the stable with 208 degrees of freedom, we find that 0 (, 02 , 
and 0 g are significant at the 1 per cent level and that 04 is not significantly 
different from 0. Therefore, it is desirable to omit the independent 
variable X 4 . 

Step 7. The omission of both 5 X 3 and X 4 . Let us denote by 0 " 
and g"j the new standard regression coefficients and auxiliary statistics, 
respectively. By mathematical derivations, we have 

(i = 1, 2, 5) (14.25) 

044 

0« - 0« - 9j f^ ft 3 = 1, 2, 5) (14.26) 

For our problem, we can easily obtain 


# - a -Jp#; 

0 44 


.1500 


- A - 9* ft 

044 

= .3113 


ft' = ft-4*ft 

044 


= .4081 


011 ~ 011 


4* = 1.0025; 
0 « 


016 


= 0x6 - ^ = -.0685; 

044 


rr" — rr' — ^ 14 ^ 24 
012 - 012-77- 

044 
/2 


.0262 


022 “■ 022 3.0577 


044 

/* 


0 *'. = 02 . - = -2.5095; rfj = - ^ = 3.0620 

044 044 


, 5 The advantage of the use of the ^-statistics over the method of resolving the 
normal equations depends upon the number of independent variates. If the number 
of originally independent variates is 6 or more, or perhaps 5, and if 2 are to be elimi¬ 
nated, the use of the ^-statistics is advisable. 
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Proceeding as before, we have 


FJ.m — Pi r iv + 02 r 2v + “ .50285089 

R v . 126 = .7091 


where ra" = 3. Referring to F-table with n 1 = 3 and n 2 = 209, we have 
P < .01. Therefore, we conclude that the R is significantly different 
from zero. 

In testing the significance of the standard partial regression coeffi¬ 
cients, we have 


S 0"i 


S0" t 



4 

4 

4 


(1 Ey ] i6 )g 11 

N — m" — 1 


(i - Rl±M = 

N - m" — 1 

(1 — EJ.12 6 )(765 _ 

N — m" — \ 


.0488 

.0853 

.0853 


Consequently, we have 




S/j", 


3.074; 



3.649; 



4.784 


Referring to the stable with 209 degrees of freedom, we find that all the 
standard partial regression coefficients are significant. Therefore, we 
conclude that the three independent variables X\, X 2 , and X 6 should be 
used in order to predict the dependent variable Y. 

Step 8 . Set up the formula for the prediction of Y from Xi, X 2 , and 
X 6 . First we calculate the partial regression coefficients. Define: 




(i = 1, • • • , 5) 

(14.27) 

Similarly, 


(i = 1, 2, 4, 5) 

(14.28) 


b'S 

ot 

(i = 1, 2, 5) 

(14.29) 


For our problem, we do not need to use b< and b' { . Therefore, we have 


6 " = 0" ^ = .0544; V{ = 0" ^ = .0166; b' s ' = ft ^ = .0141 

Si St oi-o S6 

where s, = .837802, si = 2.311376, s 2 = 15.695178, and s 5 = 24.231172 
which are calculated from the results in Step 1. Denoting by ? the pre¬ 
dicted F-score, we have 

f = f + V b#i (i = 1 , 2 , • • • , 5) (14.30) 



^342 MULTIPLE REGRESSION PROBLEMS [Chap. XIV 

Similarly, 


t = Y + J) biXi 

(i = 1, 2, 4, 5) 

(14.31) 

t = ? + y K'xi 

(i = 1, 2, 5) 

(14.32) 


Again, for our problem we simply use (14.32). Then we obtain 

? - f + b"(Xi - Xi) + b' 2 '(X 2 - X 2 ) + b"(X$ - X 6 ) 

= 2.3362 - .0544(15.9296) - .0166(161.9061) 

- .0141(90.0235) + .0544X1 + .0166X* + .0141X 6 
= .0544Ai + .0166X 2 + .0141X 8 - 2.4873 

It is to be noted that this predicted score refers to the true mean 
F-score of all the individuals who have the same specific scores of X lf X 2 , 
and X 6 in the population. In other words, in the long run, the true mean 
F-score of all the individuals who have identical scores for X\ } X 2y and X 5 
in the population will approach ? within a fiducial limit which we may 
set up. Since in our example we do not find two individuals with the 
same scores for Xi, X 2 , and X 5 , it is difficult to verify the accuracy of the 
predicted score. 

Step 9. Compute the standard error of an individual predicted score. 
The general formula using the auxiliary statistics (ga )’s for predicting 
the standard error of an individual predicted score is 



where i, j = 1, • • • , m; m is the number of independent variables; sj, the 
estimate of the population variance and the Si s and s/s and the x’s 
are defined as before. 6 For our problem, we simply use i, j = 1, 2, 5 and 
change m to m" = 3 and (gr^s to (g")’8. Thus we have 



# The working formula can also be written as follows: 
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Step 10. Find the fiducial limits. 

Fiducial limits (p) = ? ± £(*>(«$) 

where p + q = 1 and the lvalue is obtained by referring to the stable 
with N — m — 1 (for our problem, N — m" — 1) degrees of freedom. 
It may be stated with a confidence coefficient of lOOp per cent that the 
true mean F-score of all the individuals who have identical X\, X 2) and 
X 5 -scores will lie in the range of f' ± t(q)(si). It is customary to make 
p = .99 or .95. For our problem, N — m" — 1 = 209. Referring to 
the stable with 209 degrees of freedom, we find that <.01 = 2.600 and 
t .05 = 1.972. Therefore, we have 

Fiducial limits (.99) = ? ± 2.600(s$) 

Fiducial limits (.95) = ¥ ± 1.972($$) 

Step 11. Practical application: To find the fiducial limits for the 
true mean F-score for the following individual values: 


= 16; X 2 = 163; X 5 = 80; F = 2.316 
? = —2.4873 + .0544(16) + .0166(163) + .0141(80) = 2.217 
xi = .0704; x 2 = 1.0939; x b = -10.0235 


st 


L001670 


1 + .1876(.0704) 2 + .0124(1.0939) 2 
+ .0052( —10.0235) 2 + .0014(.0704)(1.0939) - .0024 
(.0704) (-10.0235) - .0132(1.0939) (-10.0235) 


= .0530 


Fiducial limits (.99) = 2.217 ± 2.600(.0530) = (2.079, 2.355) 
Fiducial limits (.95) = 2.217 ± 1.972(.0530) = ( 2 . 112 , 2.322) 


The Discriminant Function. The ordering of things into classes is a 
basic procedure of empirical science. In fact, the rigorousness of the 
basis of scientific classification is an index of the development of a field 
as a science. Statistical methods are available which can be profitably 
applied to the problem of discriminating between different populations 
and classifying them. The aspect of the problem to be discussed here 
deals with the statistical uses of multi-measurement for differentiating 
between two or more groups of individuals, things, or events. This is 
frequently a problem in economics, education, psychology, or in the 
various fields of science. For instance, individuals upon whom several 
measurements are available are to be classified into groups with a mini¬ 
mum of overlapping. The traditional method is to compute the sig¬ 
nificance of the difference between the means of groups taking each 
character separately. This method is inefficient in that it does not make 
possible the evaluation of the relative amount of information for differ¬ 
entiation provided by the several measurements; neither does it combine 
the information taking into account the interrelations, if they exist, 
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between the characters dealt with. From this observation, the problem 
is clearly one analogous to multiple regression; that is, a weighted sum 
of the measurements is needed as in multiple regression. The difference 
lies in the nature of the criterion which is, in the problem discussed here, 
qualitative rather than quantitative as in the case of multiple regression. 
That is, the dependent variable is a dichotomy or a multiple classifica¬ 
tion. The particular statistic for the solution of this problem, which is 
called the discriminant function , was developed by Fisher (Ref. 10). 
The essential property of this function, which is a linear function of the 
observations, is that it will distinguish better than any other linear 
function between the specified groups on whom common measurements 
are available. The principle upon which the discriminant function rests 
is that the linear functions of the measurements will maximize the ratio 
of the difference between the specific means to the standard deviations 
within classes. This type of problem is also closely related to that 
studied by Hotelling (Refs. 15 and 16) resulting in his generalization of 
“Student's” ratio, or Hotelling's T, as it is usually called, which is a 
powerful tool for testing the significance between mean values of different 
multivariate normal populations under the assumptions of equal variances 
and equal covariances. Closely related also is the statistic developed by 
Mahalanobis (Ref. 19) and studied further by Bose and Roy (Ref. 3), 
leading to the studentized form of the distribution, in statistics called the 
generalized distance function, Z) 2 . By the use of Z> 2 , different multivariate 
populations can be not merely discriminated but also classified, that is, 
D 2 contributes both to the problem of testing significance and of estima¬ 
tion. The treatment here is limited to the discriminant function. 

We first present the formulation and solution of the mathematical 
problem. Then a practical application is presented. 

Two Groups . If we have samples of Ni and N 2 observations, respec¬ 
tively, and make p measurements Xi, . . . , X p on each individual, con¬ 
sider first the question: What linear function of the measurements will 
maximize the ratio of the difference between the means of the two classes 
to the standard deviation within classes? The linear function is repre¬ 
sented by 

a - T \Xi (t = 1, • • • , p) (14.34) 

Let the difference between means of x< be represented by d<, where 
i — 1, * • • , p for the p measurements. Represent the sum of squares 
or products from the specific means within classes by &</, where i, j = 1, 
• • • , p. Then for any linear function, a, of the measurements, the 
difference between the means of a in the two specific groups is 

D - y \di (* - 1, • • • , p) (14.35) 

7 
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while the variance of a within classes is proportional to 



= n A<XA, (», j = 1, • • • , p) (14.36) 
» ; 

The particular function which best discriminates the two groups will 
be one for which the ratio D 2 /S a is greatest, by variation of the p coeffi¬ 
cients, Xi, . . . , X p , independently. Mathematically, we should seek 
the solution for each X: 




d\ \ 

S -» 

(14.37) 

which reduces to 

S 2 ' 

« 

/ 

(14.38) 

and consequently, 


, dS 
* ax 

S dD 

D ' d\ 

(14.39) 


where it may be noticed that 8/D is a factor common to the p unknown 
Vs. Therefore, the coefficients required are proportional to the solutions 
of the normal equations: 


Let us define: 


SnXi + • 

■ * + SipXp = di 

SpiKi + * 

* ■ I Spp\p dp 


(14.40) 


Li = V&i Xi (* = !,•••, -p ) (14.41) 


In (14.40) we divide the ith equation by \/]Sii, where i = 1, • • • , p. 
Then we have the following set of normal equations: 


rnLi + • 

" +r “ i '“v4r, 

TpiLi + • 

• • + t pv L p = 4= 

V fopp 


(14.42) 


We can easily solve (14.42) for L’s by Fisher’s method of auxiliary statis¬ 
tics, in which unity is substituted for each of the di/\^Si ?s in turn, while 
the others are made equal to zero as follows: 


rnLi + * 

• • + ripLp = 1, • • 

• ,o' 

rpiLx + • 

• • + r pp L p = 0, • • 

• , i. 


(14.43) 
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Let us define the means of a for these two groups: 


Si 


&2 



* * i P) 

• • » P) 


(14.44) 

(14.45) 


when Xu is the mean value of Xi for the first group and X 2 i is the mean 
value of Xi for the second group. We wish to test the hypothesis: 


Ho:E(&i) = E(s 2 ) 


( 


E is the notation for the expectation V 
of a parameter ) 


(14.46) 


that is, the hypothesis that there is no significant difference between 
two groups for the function a. By mathematical deductions, the sums 
of squares due to “within groups” and “between groups” are 


“Within groups” D with n 2 = Ni + N 2 — p — 1 

“Between” x r ~ . ttt t D 2 with «i = p 

Ni + N 2 


Then the test of H 0 is given by 


F = 


Ni + N 2 — p — 1 
V 


NiN 2 
Ni + N 2 


D 


(14.47) 

(14.48) 


If we reject the hypothesis, H 0 , we may conclude that the obtained values 
of X’s are the assigned weights of the measurements which best discrim¬ 
inate these two groups. Then the next problem arises such that if we 
have another individual to be observed by making the same measure¬ 
ments, Xi, . . . , X p , on him, we wish to know to which group he 
belongs. Wald (Ref. 26) has shown two methods for solving this 
problem: (1) when Ni and N 2 are sufficiently large, and (2) when Ni and 
N 2 are small. For the time being, we assume that Ni and N 2 are suffi¬ 
ciently large. By using Wald’s criterion, let us denote by ti and ir 2 the 
populations of the first group and the second group, respectively. The 
hypothesis tested in this problem is that the individual is drawn from iri. 
First we calculate: 


Si *■ y V SijXudj = Xi.Xii + • • • + Xp.Xip 

7 7 

(*, J ~ 1> * * * > P) (14.49) 
S* = V V SijXudj = XiNji + • • • + \ P X 2p 
7 7 

(i ,; = 1, • • • , p) (14.50) 

U = ^ ^ SijXdj =“ \iXx + * • • + \pXp 

' * a. j -1, • • •, p) (i4.5i) 
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where »i, 5 2 , Xu, Xu, S^, and d, are defined as before, Xi is the value 
obtained by this individual on the ith measurement; and U is the value 
obtained by the individual for the linear function a . Then the critical 
region for rejecting the hypothesis with the least risk of both kinds of 
error, that is, accepting the hypothesis when it is false and rejecting the 
hypothesis when it is true, is given by 


U £ (14.52) 

Problem XIV.2. Discrimination between two groups. There were two 
classes in the College of Science, Literature and Arts in the University 
of Minnesota. One class was taking the course Physics 7, which was 
more advanced than Physics 1 taken by the other class. Three measure¬ 
ments were available for each individual: mathematical test score, 
American Council Examination (A.C.E.) test score, and honor-point ratio 
(H.P.R.). Let us denote the mathematical test score by X h the A.C.E. 
test score by X 2 , and the H.P.R. by X 3 . The calculated measures are 
summarized in Table 101. 


TABLE 101 

Calculated Measures for Two Groups 



Physics 1 

Physics 7 

N 

111 

257 


9,728 

23,746 


87.6396 

92.3969 

2X 2 

3,450 

14,411 

X 2 

31.0811 

56.0739 

2X 3 

128.6 

326.1 

X, 

1.1586 

1.2689 

2Xi 2 

905,694 

2,388,412 

2X 2 2 

118,846 

823,945 

2X 3 2 

200.84 

534.17 

XXiX 2 

307,220 

1,349,410 

2X,X 3 

11,756.0 

31,974.6 

2 X 2 X 3 

4,240.8 

19,122.3 

rfl 

4.7573 


d% 

24.9928 


d. 

0.1103 



In Table 102 are recorded the computations leading to the pooled sum 
of squares and products within the two groups. In the line of totals, the 
entries are the sum of squares and products of the entire 368 individuals. 

In the line for groups are put down the sums of squares and products 
of the group sums in Table 101, calculated in the manner characteristic 
of analysis of variance and covariance. As an example, the entry for 
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column X\ in row X\ of Table 102 is 


349/ 


(9728) 2 , (23,746) 2 
111 + 257 


3,046,614.8969 


and for column Xt, row X\, 

(9728K3450) + (23,746)(14,411) _ 


The differences in the third line are the sums of squares and products of 
deviations within the group. The calculation of the standard deviations 
and the correlations now proceed in the usual manner. 

As examples, 


Sl = 


ru = 


V247,491.1031 
V366 
22,741.7023 
(497.4847) (165.7705) 


= 26.003942 


= .275763 


The degrees of freedom used, 366, are those within the two groups, 

(Ni - 1) + (N s - 1). 

Calculate: 


d, 


= .009563; 


= .150767; 


dt 


= .008404 


- - ,VWOWV/W , . - - . AWVI V»l I ... - 

\ZsTi V&2 

Consequently, we obtain the following set of normal equations: 


1.000000Li + .2757631,2 + .356796L, = .009563 
.275763Li + 1.000000L 2 + .496589L S = .150767 
.356796Li + .496589L 2 + 1.000000L* = .008404 

The solutions of L\, L 2 , and L 3 are carried out in Table 103. 

In Table 103 a convenient check column is often carried along, to the 
right of these computations. The first and second entries of this check 
column are found by adding all other entries in their respective rows. 
The third entry is found in two ways, thus yielding a check on the 
accuracy of the arithmetical computations. The first way consists of 
addition of all entries in the third row; the other way consists of operat¬ 
ing on the first entry in the check column in accordance with the directions 
given at the left. The other entries in the check column are found in a 
similar way. 

The values of k sl , k it , and k ti can be read directly 7 from the last row, 
(10) in Table 103: 

k »i = -.339403; k ti = -.614720; - 1.426361 


7 The Avvalues are used in the calculations of the L’a as noted on page 351. 
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In order to obtain * 2 i, * 2 2 , and * 2 3 : 

Substitute k 3 i in Eq. (5), Table 103, using column (D) in the right 
member: 

* 2 i + .430971 (-.339403) = -.298459 
*21 = -.152186 

Substitute k 33 in Eq. (5), Table 103, using (D') in the right member: 

*22 + .430971 (-.614720) = 1.082304 
*22 = 1.347230 

Substitute * 33 in Eq. (5), Table 103, using (D") in the right member: 

*23 + .430971(1.426361) = 0 
*23 -.614720 

To obtain *n, * 12 , and *i 3 : 

Substitute *21 and *31 in Eq. ( 1 ), Table 103, using (D): 

*11 + .275763(-.152186) + .356796 (-.339403) = 1 
*u = 1.163065 

Substitute *22 and * 3 2 in Eq. ( 1 ), Table 103, using (D'): 

*12 + .275763(1.347230) + .356796( - .614720) = 0 
kit = -.152186 

Substitute * 23 and * 33 in Eq. ( 1 ), Table 103, using (D"): 

* 13 + .275763(-.614720) + .356796(1.426361) = 0 
*, 3 = -.339403 


It is noted that 

*»/ = *j» (* ^ jj j ~ 1» 2, 3) 

This is a good check on the calculation of *,,■ (i ^ j). The check of *„• 
(i = 1 , 2, 3) can be carried out easily. To obtain *n, the sum of products 
of the last two numbers (regardless of sign) in each section in column (D) 
is found. For *22 do the same in column (D'), and so on. We have 

*11 = 1.000000 + .275763 (.298459) + .237950(.339403) = 1.163065 
*22 = 1.000000(1.082304) + .430971 (.614720) = 1.347230 
fc 33 = 1.000000(1.426361) = 1.426361 


The values of L\, L 3 , and L 3 are obtained by calculating the following 
equations: 

Li = y== *11 4 7== *12 4- — 7 = *is 

V<Sii V <822 V <S 33 

Lt = —7= *21 4 — 7== *22 4 — 7== *28 
vSu V-S22 V«3S 

Lt = >= = *81 4 -7== *82 4 -7== *88 


\/Su 


vs ;. vsTt 
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Consequently, the values of Xi, X 2 , and X 3 are obtained by calculating the 
following equations: 


Xi 


Li_ . 
y/ Sn 



Xs = 


U 


Sm 


All these values are shown in Table 103. 

The next step is to calculate the value of D. As a check we can use 
two equations: 

dz 


In our case: 


D Ll vki +Li vk> 

D = Xidi + X ^2 + X 3 dz 
D = .028779 


+ Lz 


\/^>33 


This value is also the “ within” sum of squares, 
squares is 


r * D* = .064186 
Ni + Nz 


The “ between 11 sum of 


The test of significance between two groups on the variable a is given in 
Table 104. # 


TABLE 104 

Analysis of Variance of a between and within Groups 


Source of variation 

D.F. 

S.S. 

M.S. 

F 

Hypothesis 

Within groups 

Between groups 

364 

3 

.028779 

.064186 


270.617 

Rejected 

Total 

367 

.092965 





Referring to the F-table with ni = 3 and n 2 = 364, we find p < .01. 
Therefore, we reject the hypothesis of homogeneous groups; and the 
relative value of the variable a for discriminating between groups is 
apparently indicated by the weights of the different measurements: 


Xi = —.00002950; X 2 = .00118535; X 8 = -.00639576 

Now suppose an individual is given these same measurements and 
obtains 

Xi = 80; X 2 = 40; X z = 1.5 

We wish to know to which group this individual should be assigned. 
First, we calculate 


Bi « -.00002950(87.6396) + .00118535(31.0811) - .00639576(1.1586) 

= .026846 

- -.00002950(92.3969) + .00118535(56.0739) - .00639576(1.2689) 

- .055626 

U 9 * -.00002950(80) + .00118535(40) - .00639576(1.5) = .035460 


- ^ = .041236 
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It is evident that U < — ~j" — 

2 

Therefore, we may conclude that this individual should be assigned to the 
class Physics 1 . 

Problems 

1 . What methods of multivariate analysis other than those reported in 
this chapter are available? Which of these are applicable to problems 
in the field of your interest? [In this connection, see Tintner, Ger¬ 
hard, “Some Applications of Multivariate Analyses to Economic 
Data,” J<rxrnal of the American Statistical Association , Vol. 41, pp. 
472-500 (December, 1946.)] 

2. Specify the problem of factor analysis in psychology as a special appli¬ 
cation of the theory of regression. [See Holzinger, K. J., and Harmon, 
H. H., Factor Analysis (University of Chicago Press, 1941); Thompson, 
G. H., The Factorial Analysis of Human Ability: (Houghton Mifflin 
Company, 2d. ed., 1946); Thurstone, L. L., Multiple-Factor Analysis: 
A Development and Expansion of the Vectors of Mind (University of 
Chicago Press, 1947.)] 

3. The following data for a random sample of 50 students were taken from 
a study dealing with the prediction of achievement of freshmen in a 
particular college of the University of Minnesota: 

Y i = honor-point ratio at the end of the fall quarter 
Y 2 — honor-point ratio at the end of the freshman year 
Xi — score on Johnson Science Application Test 
X 2 = score on an English test 
Xz = score on the Cooperative Algebra Test 
Xa = percentile rank in high-school graduation class transformed 
to probits 

In this problem you are to do the following: 

(a) Set up the multiple regression equation for predicting either Y x 
or Y 2 from Xi, X 2y X 3 , and X 4 . 

(b) Test the significance of the 

( 1 ) Standard partial regression coefficients (the betas). 

(2) Multiple correlation coefficient. 

(3) Differences between the respective betas. 

(c) Set up a new multiple regression equation eliminating the inde¬ 
pendent variable or variables that are not statistically significant. 

(d) Repeat (b). 

(e) Calculate the standard error of the predicted score and set up 
the confidence interval, with a confidence coefficient of 98 per cent 
for Students 8 , 25, 43, and 47. 
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Student 

No. 

Yt 

Honor- 
point ratio 

m 

Y t 

Honor- 
point ratio 
fws( 2) 

X x 

Johnson 

Science 

(3) 

x 2 

Coop. 

English 

(4) 

x 3 

Algebra 

(5) 

x t 

Iligh-school P.R. con¬ 
verted into S.D. units 
(6) 

1 

1.65 

1.57 

56 

80 

46 

4.72 

2 

1.29 

1.38 

34 

113 

48 

5.64 

3 

.88 

1.15 

32 

94 

75 

4.92 

4 

1.29 

.11 

55 

47 

32 

6.48 

5 

.94 

.54 

37 

126 

59 

5.05 

6 

.80 

.83 

32 

81 

7 

4.53 

7 

.46 

.50 

33 

115 

34 

5.95 

8 

1.00 

.85 

62 

148 

58 

6.65 

9 

.72 

.06 

28 

84 

36 

4.39 

10 

.31 

.76 

41 

119 

16 

4.33 

11 

.13 

.64 

20 

69 

7 

5.03 

12 

.20 

.39 

47 

77 

24 

4.77 

13 

.44 

.70 

33 

106 

16 

4.48 

14 

1.00 

1.49 

44 

80 

31 

4.69 

15 

.21 

.60 

41 

84 

28 

5.05 

16 

1.27 

1.67 

28 

79 

15 

4.67 

17 

1.06 

1.65 

47 

109 

64 

5.10 

18 

.71 

.84 

50 

92 

23 

4.23 

19 

- .07 

- .24 

31 

93 

20 

3.44 

20 

1.65 

1.43 

43 

74 

32 

5.81 

21 

1.59 

.50 

59 

87 

58 

4.87 

22 

- .12 

.43 

38 

95 

14 

4.01 

23 

.27 

.35 

29 

72 

38 

3.77 

24 

.12 

.41 

27 

106 

18 

5.10 

25 

.00 

.41 

38 

71 

22 

5.00 

26 

1.12 

1.12 

40 

122 

12 

5.39 

27 

.29 

.37 

41 

84 

26 

4.77 

28 

1.00 

1.12 

46 

123 

32 

4.29 

29 

1.31 

.98 

55 

111 

24 

5.95 


1.56 

1.14 

52 

86 

15 

4.87 

31 

1.71 

1.08 

46 

76 

17 

5.71 

32 

.13 

.33 

48 

111 

23 

4.95 

33 

.53 

1.06 

59 

105 

61 

5.81 

34 

.12 

.60 

25 

98 

45 

4.50 

35 

.29 

.69 

42 

72 

0 

5.67 

36 

.75 

.58 

39 

115 

10 

5.99 

37 

.09 

.17 

37 

116 

40 

4.75 

38 

.73 

.85 

28 

49 

21 

5.81 

39 

.09 

.17 

37 

116 

40 

4.75 

40 

1.41 

.84 

62 

89 

50 

4.69 

41 

.24 

.11 

24 

58 

13 

5.00 

42 

.24 

.48 

32 

117 

17 

4.77 

43 

2.00 

1.56 

47 

72 

15 

5.25 

44 

.07 

.18 

45 

115 

24 

4.64 

45 

1.00 

1.30 

52 

86 

44 

5.15 

46 

1.57 

1.67 

65 

147 

84 

6.08 

47 

- .62 

- .77 

31 

98 

36 

2.67 

48 

.77 

.21 

54 

129 

64 

4.12 

49 

1.47 

1.33 

51 

131 

5 

5.84 

50 

• 

.56 

0.00 

30 

75 

21 

4.12 
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4 . The following statistics were derived from data collected in a study 
dealing with the relation between instruction in a course in college 
biology and the students* belief in the efficacy of certain commercial 
preparations and home remedies. The criterion was the score on the 
test, y. The independent variates, X ly X 2 , X Z} X A , and X b are 
specified below. In this problem: 

(a) Set up the multiple regression equation for estimating Y from X h 
Xty X Z f X Ay and X z . 

(b) Test the significance of the 

(1) Standard partial regression coefficients. 

( 2 ) Multiple correlation coefficient. 

(c) Of the variance of the dependent variate Y accounted for by the 
combined effect of the independent variates, calculate the pro¬ 
portion assignable to each of the independent variates. (See 
Johnson, Palmer O., “The Differential Function of Examinations/* 
Journal of Educational Research , Vol. 30 (1936), pp. 93-103.) 

(d) Test the significance of the difference between the two largest 
partial regression coefficients. 

(e) Find the 5 per cent fiducial limits for the largest partial regression 
coefficient. 

(f) Calculate the partial correlation coefficient, r Y x i.x 6 , and test its 
significance. 


Zero order correlations: 

8 

II 

fe; 


ri2 = .452 
t 13 = .303 

r 2 3 = .638 



ri4 = .324 

r 24 = .274 

r 34 = .171 


r 15 = .147 

r 26 = .326 

r 3 5 = .190 

ra = .189 

n v = .514 

Tiy — .621 

Tiy — .542 

II 

O 

II 

H- 

00 

Si = 6.50 

Xi = 22.52 

Where Y 

= score on application in 




hygiene 

Ss — 23.78 

A 2 = 80.54 

Xi 

= score on test of facts and 




principles in hygiene 

Ss = 4.20 

X 3 = 23.4 

x 2 

= score on vocabulary test 




in hygiene 

s 4 = 0.956 

Xi = 4.95 

Xi 

= score on final examina¬ 




tion in hygiene 

SB = 1.00 

Xi = 5.60 

Xi 

= transformed high-school 




percentile ranks 

Sy ~~ 4.89 

f = 32.08 

Xi 

= transformed College Apti- 


tude Test percentile ranks 

6. The following data were collected on two groups of students in an 
experimental investigation of the relative efficacy of two different 
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c methods in teaching agricultural chemistry at the high-school level 
Compare the two groups with respect to the set of multiple measure¬ 
ments made at the beginning of the experiment. 


Topical Assignment Group 


Pupil 

Xt 

x t 

X , 

x t 

1 

110 

1.37 

38 

27 

2 

81 

1.62 

18 

3 

3 

111 

2.49 

26 

10 

4 

110 

1.96 

20 

7 

5 

95 

0.86 

15 

8 

6 

85 

0.56 

14 

10 

7 

97 

1.38 

25 

9 

8 

90 

0.25 

13 

3 

9 

85 

0.51 

21 

11 

10 

83 

0.78 

21 

5 

11 

83 

1.15 

22 

7 

12 

100 

2.24 

31 

15 

13 

106 

0.72 

22 

3 

14 

92 

1.36 

20 

6 

15 

94 

1.25 

16 

11 

16 

96 

0.62 

12 

9 

17 

83 

0.90 

16 

8 

18 

113 

2.65 

28 

8 

19 

104 

1.61 

21 

11 

20 

93 

1.51 

20 

10 


Discussion Group 


Pupil 

*1 

Xa 

X , 

X 4 

1 

103 

1.50 

26 

15 

2 

115 

1.74 

28 

12 

3 

104 

1.36 

25 

7 

4 

85 

0.25 

20 

3 

5 

84 

0.53 

17 

5 

6 

87 

0.25 

23 

6 

7 

93 

2.00 

34 

6 , 

8 

112 

1.34 

24 

4 

9 

123 

2.64 

44 

16 

10 

106 

0.75 

20 

10 

11 

99 

2.11 

24 

13 

12 

80 

0.45 

22 

7 

13 

112 

1.96 

40 

16 

14 

91 

1.19 

17 

5 

15 

77 

0.42 

14 

6 

16 

96 

1.68 

20 

11 

17 

85 

0.90 

15 

2 

18 

115 

1.65 

22 

7 

19 

117 

1.75 

26 

11 


Xi = Intelligence quotient based on Kuhlman-Anderson Tests. 

X 2 = Honor-point ratio of previous year's work. 

Xz = Score on pretest of knowledge of facts and principles examina¬ 
tion administered at the beginning of the term. 

Xi = Score on pretest of Glenn-Welton Chemistry Achievement Test 
administered at the beginning of the year. 
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APPENDIX 


TABLE I* 

Proportion op the Cases in a Normal Distribution Lying below Certain 

Values of tiie Abscissa 


Abscissa 

X - M 

S “ 2 

Proportion 
of cases 
below Z 

Abscissa 

X — M 

S " 2 

Proportion j 
of cases j 
below z 

* 

Abscissa 

X - M 

S ~ 2 

Proportion 
of cases 
below z 

.00 

.5000 

1.25 

.8944 

2.50 

.9938 

.05 

.5199 

1.30 

.9032 

2.55 

.9946 

.10 

.5398 

1.35 

.9115 

2.60 

.9953 

.15 

.5596 

1.40 

.9192 

2.65 

.9960 

.20 

.5793 

1.45 

.9265 

2.70 

.9965 

.25 

.5987 

1.50 

.9332 

2.75 

.9970 

.30 

.6179 

1.55 

.9394 

2.80 

.9974 

.35 

.6368 

1.60 

.9452 

2.85 

.9978 

.40 

.6554 

1.65 

.9505 

2.90 

.9981 

.45 

.6736 

1.70 

.9554 

2.95 

.9984 

.50 

.6915 

1.75 

.9599 

3.00 

.9987 

.55 

.7088 

1.80 

.9641 

3.05 

.9989 

.60 

.7257 

1.85 

.9678 

3.10 

.9990 

.65 

.7422 

1.90 

.9713 

3.15 

.9992 

.70 

.7580 

1.95 

.9744 

3.20 

.9993 

.75 

.7734 

2.00 

.9772 

3.25 

.9994 

.80 

.7881 

2.05 

.9798 

3.30 

.9995 

.85 

.8023 

2.10 

.9821 

3.35 

.9996 

.90 

.8159 

2.15 

.9842 

3.40 

.9997 

.95 

.8289 

2.20 

.9861 

3.45 

.9997 

1.00 

.8413 

2.25 

.9878 

3.50 

.9998 

1.05 

.8531 

2.30 

.9893 

3.55 

.9998 

1.10 

.8643 

2.35 

.9906 

3.60 

.9998 

1.15 

.8749 

2.40 

.9918 

3.65 

.9999 

1.20 

.8849 

2.45 

.9929 

3.70 

.9999 


* Table arranged by Dr. Robert W. B. Jackson and used with his permission. For the extended table 
and for other tables of the normal curve the reader is referred to the tables given by Karl Pearson in Tables 
for Statisticians and Biometricians , Part I, issued by the Biometric Laboratory, University College, 
London. 
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TABLE II* 

Distribution or t 


Probability 


n 

.9 

.8 

.7 

.6 

.5 

.4 

.3 

.2 

.1 

.05 

.02 

.01 

.001 

1 

.158 

.325 

.510 

.727 

1.000 

1.376 

1.963 

3.078 

6.314 

12.706 

31.821 

63.657 636.619 

2 

.142 

.289 

.445 

.617 

.816 

1.061 

1.386 

1.886 

2.920 

4.303 

6.965 

9.925 

31.598 

3 

.137 

.277 

.424 

.584 

.765 

.978 

1.250 

1.638 

2.353 

3.182 

4.541 

5.841 

12.941 

4 

.134 

.271 

.414 

.569 

.741 

.941 

1.190 

1.533 

2.132 

2.776 

3.747 

4.604 

8.610 

5 

.132 

.267 

.408 

.559 

.727 

.920 

1.156 

1.476 

2.015 

2.571 

3.365 

4.032 

6.859 

6 

.131 

.265 

.404 

.553 

.718 

.906 

1.134 

1.440 

1.943 

2.447 

3.143 

3.707 

5.959 

7 

.130 

.263 

.402 

.549 

.711 

.896 

1.119 

1.415 

1.895 

2.365 

2.998 

3.499 

5.405 

8 

.130 

.262 

.399 

.546 

.706 

.889. 

k l. 108 

1.397 

1.860 

2.306 

2.896 

3.355 

5.041 

9 

.129 

.261 

.398 

.543 

.703 

.883 

1.100 

1.383 

1.833 

2.262 

2.821 

3.250 

4.781 

10 

.129 

.260 

.397 

.542 

.700 

.879 

1.093 

1.372 

1.812 

2.228 

2.764 

3.169 

4.587 

11 

.129 

.260 

.396 

.540 

.697 

.876 

1.088 

1.363 

1.796 

2.201 

2.718 

3.106 

4.437 

12 

.128 

.259 

.395 

.539 

.695 

.873 

1.083 

1.356 

1.782 

2.179 

2.681 

3.055 

3.318 

13 

.128 

.259 

.394 

.538 

.694 

.870 

1.079 

1.350 

1.771 

2.160 

2.650 

3.012 

4.221 

14 

.128 

.258 

.393 

.537 

.692 

.868 

1.076 

1.315 

1.761 

2.145 

2.624 

2.977 

4.140 

15 

.128 

.258 

.393 

.536 

.691 

.866 

1.074 

1.341 

1.753 

2.131 

2.602 

2.947 

4.073 

16 

.128 

.258 

.392 

.535 

.690 

.865 

1.071 

1.337 

1.746 

2.120 

2.583 

2.921 

4.015 

17 

.128 

.257 

.392 

.534 

.689 

.863 

1.069 

1.333 

1.740 

2.110 

2.567 

2.898 

3.965 

18 

.127 

.257 

.392 

.534 

.688 

.862 

1.067 

1.330 

1.734 

2.101 

2.552 

2.878 

3.922 

19 

.127 

.257 

.391 

.533 

.688 

.861 

1.066 

1.328 

1.729 

2.093 

2.539 

2.861 

3.883 

20 

.127 

.257 

.391 

.533 

.687 

.860 

1.064 

1.325 

1.725 

2.086 

2.528 

2.845 

3.850 

21 

.127 

.257 

.391 

.532 

.686 

.859 

1.063 

1.323 

1.721 

2.080 

2.518 

2.831 

3.819 

22 

.127 

.256 

.390 

.532 

.686 

.858 

1.061 

1.321 

1.717 

2.074 

2.508 

2.819 

3.792 

23 

.127 

.256 

.390 

.532 

.685 

.858 

1.060 

1.319 

1.714 

2.069 

2.500 

2.807 

3.767 

24 

.127 

.256 

.390 

.631 

.685 

.857 

1.059 

1.318 

1.711 

2.064 

2.492 

2.797 

3.745 

25 

.127 

.256 

.390 

.531 

.684 

.856 

1.058 

1.316 

1.708 

2.060 

2.485 

2.787 

3.725 

26 

.127 

.256 

.390 

.531 

.684 

.856 

1.058 

1.315 

1.706 

2.056 

2.479 

2.779 

3.707 

27 

.127 

.256 

.389 

.531 

.684 

.855 

1.057 

1.314 

1.703 

2.052 

2.473 

2.771 

3.690 

28 

.127 

.256 

.389 

.530 

.683 

.855 

1.056 

1.313 

1.701 

2.048 

2.467 

2.763 

3.674 

29 

.127 

.256 

.389 

.530 

.683 

.854 

1.055 

1.311 

1.699 

2.045 

2.462 

2.756 

3.659 

30 

.127 

.256 

.389 

.530 

.683 

.854 

1.055 

1.310 

1.697 

2.042 

2.457 

2.750 

3.646 

40 

.126 

.255 

.388 

.529 

.681 

.851 

1.050 

1.303 

1.684 

2.021 

2.423 

2.704 

3.551 

60 

.126 

.254 

.387 

.527 

.679 

.848 

1.046 

1.296 

1.671 

2.000 

2.390 

2.660 

3.460 

120 

.126 

.254 

.386 

.526 

.677 

.845 

1.041 

1.289 

1.658 

1.980 

2.358 

2.617 

3.373 

00 

.126 

.253 

.385 

.524 

.674 

.842 

1.036 

1.282 

1.645 

1.960 

2.326 

2.576 

3.291 


* Table II is reprinted from Table III, Distribution of t, in Fisher and Yates, Statistical Tables for 
Biological , Medical and Agricultural Research, Oliver <fe Boyd, Ltd., Edinburgh, by permission of the 
authors and publishers. 
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For larger values of n, the expression V2x* — v2n — 1 may be used as a normal deviate with unit variance. 

* Table III is reprinted from Table IV, Distribution of x*i in Fisher and Yates, Statistical Tables for Biological, Medical and Agricultural Research , Oliver & Boyd, Ltd., 
Edinburgh, by permission of the authors and publishers. 





















TABLE IV* 

5% (Roman Type) and 1% (Bold Face Type) Points fob the Distribution of F 
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The function, F — e with exponent 2 z, is computed in part from Fisher’s table VI (7). Additional entries are by interpolation, mostly graphical. 

* This table is reproduced from Table 10.7 in Statistical Methods (4th ed.), 1946, with the permission of Professor George W. Snedecor and the publishers, The Iowa State 
College Press, Ames, Iowa. 
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TABLE V* 

Test for Significance of Differences in Variance among K Samples Each of 
Size n (P. P. N. Nayer’s Tables) 


5% limits of L\ 


/ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

11 

14 

19 

29 

59 

00 



3 

4 

5 

6 

a 

8 

9 

10 

12 

15 



60 

00 

2 

.079 

.312 

n 

.585 

.656 

.708 

.745 

.775 

.798 

.833 

.868 


.935 

.968 

1.000 

3 


Wm 

jfwf 

.576 

.648 

.700 

.739 

.769 

.792 

.828 

Wm 

.898 

.933 

.967 

1.000 

4 


.315 

ms 

.585 

.656 

wm 

.744 

.774 

.797 

.832 

.866 

Eiiiil 

.934 

.967 

1.000 

5 


.328 

.491 

.595 

.665 

.714 

.751 

.780 

Wvm 

.836 

Wm 

.903 

.936 

.968 

1.000 

6 


.339 

.502 

.604 

.673 

.721 

.757 

.785 

.808 

.841 

.873 

.906 

.938 

.969 

1.000 

7 


Wm 

.512 

.612 

Wm 

.727 

wm 

wm 

.812 

.844 

.876 

Eli!!! 

.939 

Wm 

1.0Q0 

8 


.359 

.520 

Wm 

.686 

.733 

.768 

.795 

.816 

.848 

.879 

Wm 

.941 

.971 

1.000 

9 


.367 

.527 

.626 

.691 

.738 

.772 

.798 

.819 

.851 

.881 

.912 

.942 

.971 

1.000 

10 

.117 

.374 

.534 

.631 

.696 

.742 

.776 

E|| 

.822 

.853 

.883 

.913 

.943 

.972 

1.000 

12 

.124 

.387 

.545 

.641 

.704 

.749 

.782 

EH 

.828 

.857 

.887 

.916 

.944 

.973 

1.000 

14 

.130 

.397 

.554 

.649 

.711 

.755 

.787 

.812 

.832 

.861 


.918 

.946 

.973 

1.000 

16 

.136 

BIB 

.561 

.655 

.716 

.759 

.791 

.816 

.835 

.863 

.892 


.947 

.974 

1.000 

18 

.142 

.412 

.567 

K«Ki] 

.721 

.763 

.795 


.838 

.866 

.894 

.921 

.948 

.974 

1.000 

20 

.147 

.418 

.573 

.665 

.725 

.767 

.798 


.840 

.868 

.896 

.922 

.949 

.975 

1.000 

22 

.152 

.424 

.577 

.669 

.728 

.770 


.824 

.843 

E2] 

.897 

.924 

.950 

.975 

1.000 

24 

.156 

.428 

.581 

.672 

.731 

.772 


.826 

.844 

.872 

.898 

.924 

.950 

.975 

1.000 

26 

.160 

.433 

.585 

.675 

.734 

.775 

.805 

.828 

.846 

.873 

.899 

.925 

.951 

.976 

1.000 

28 

.163 

.437 

.589 

.678 

.736 

.777 


.829 

.848 

.874 

Wm\ 

.926 

.951 

.976 

1.000 

30 

.166 

.441 

.592 

.681 

.739 

.779 

ESI 

.831 

.849 

.876 

.901 

.927 

.952 

.976 

1.000 


n 

Criterion Li — ;-> where St* — — / (X — •£<)*• Note: For n = 2, 

< 

A;* 60 and L. 0 « — .187. 
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TABLE V (Continued) 


1 % limits of L\ 


/ 

1 

2 

3 

4 

5 

6 

B 

8 

9 

11 




59 

00 

X 

2 

3 

4 

5 

6 

7 

8 

9 

10 

12 

15 

20 

30 

60 

00 

2 

.016 

.141 

.284 

.398 

.485 

.551 


.645 

.678 


.783 

.836 

.890 

.945 

1.000 

3 


.162 

.314 

.429 

.514 

.578 

.628 

.667 

.699 

.748 

.798 

.848 

.898 

.949 

1.000 

4 


.188 

.345 

.459 

.542 


.652 

.689 

.719 


.812 

.859 


.953 

1.000 

5 



.370 

.484 


W*yz\ 

EH 

Wm. 

.735 

.779 

.823 

.867 

.911 

.956 

1.000 

6 



.391 

.504 

.583 

.641 

.685 

mm 

.748 

.789 

.832 

.874 

.916 

.958 

1.000 

7 


.246 

.409 

.520 

.597 

.654 


.730 

.757 

.798 

.839 

.879 

Wm\ 

K5TSTS1 

1.000 

8 


mm 

.424 

.534 

wm 

.665 


Wm 

.766 

.805 

.844 

.884 

.923 

.962 

1.000 

9 


.273 

.437 

.545 

mm 

.674 

.715 

.747 

.773 

.811 

.849 

.887 

.925 

.963 

1.000 

10 

.063 

.284 

.448 

.555 

.629 

.682 

.722 

.753 

.779 

.816 

.853 

.890 

.927 

.964 

1.000 

12 

.071 


.467 

.572 

.644 

.696 

.734 

.764 

.789 

.824 


.896 

.931 

.966 

1.000 

14 

.079 

.318 

.481 

.585 

.655 

lTZtT3 

.744 

.773 

.796 

.831 

.865 

jjlilfj 

.933 

.967 

1.000 

16 

mm 

.331 

1.493 

.596 

.665 

.714 

.751 

.779 

mun 

.836 


wm 

.936 

.968 

1.000 

18 

.093 

.342 

Wml 


.672 

.721 

.756 

.784 


.840 

.873 


.937 

.969 

1.000 

20 

bee? 

.352 

.512 

.613 

.679 

.727 

.761 

.788 

.811 

.844 

.876 

.908 

.939 

.970 

1.000 

22 

VI 

.360 

.520 

.619 

.684 

.732 

.765 

.792 

.814 

.847 

.878 

mm 

.940 

.970 

1.000 

24 


.367 


.624 

.688 

.736 

.768 

.795 

.817 


.880 

.911 

.941 

.971 

1.000 

26 

.115 

.373 

.532 


.693 

Wm 

.772 

.798 

.820 

.852 

.882 

.912 

.942 

.971 

1.000 

28 

.119 

.379 

.537 


.697 

.744 

.776 

mm 

.823 

.854 

.884 

.914 

.943 

.972 

1.000 

30 

.123 

.386 

.543 


.703 

.748 

.781 

.806 

.827 

.856 

.886 

.915 

.944 

.972 

1.000 


Note: For n — 2, k = 50 and L. 0 i ** .151. 


♦These tables are reproduced, by permission, from Statistical Research Memoirs , Vol. I, edited by J. 
Neyman and E. S. Pearson and issued by the Department of Statistics, University of London, University 
College, London. 
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INDEX 


A 

Agreement, coefficient of, 177 
calculation of, 178 
significance of, 178-179 
Aitken, A. C., 167 
Alexander, H. W., 147, 325 
Amount of information, see Information 
Analysis of variance: 
application of, to testing: 

differential educational development 
by grades, 246-252 
homogeneity of multiple groups of 
measurements, 231-234 
independence of mental ages of 
twins, 226-230 

linearity of regression of final on 
initial scores, 241-246 
assumptions underlying, 164, 212, 218, 
219, 226 

compared with traditional biometric 
method, 216 

division of degrees of freedom in, 214, 
215 

division of sums of squares in, 215, 216, 
220 

experimental and sampling designs 
dependent on, 210 
F-test, or z-test in, 54, 214 
interaction in, 222, 224, 265 
k -way classification, 224 

the solution for the sum of squares, 
224, 225 

one-way classification, 219 
maximum likelihood solution of, 219 
hypothesis tested, 220 
randomization in, 164 
two-way classification, 221 

maximum likelihood solution of, 221 
hypothesis tested, 222, 223 
unequal representation in the sub¬ 
classes in, 260-261 
Analysis of variance and covariance: 
application to testing differential edu¬ 
cational development by grade, 
252-260 

complete procedure for analysis with 
one independent variable, 252-255 
complete procedure for analysis with 
two independent variables, 256- 
260 


Analysis of variance and covariance 
{cord.): 

application to testing equality of grade 
means and school means on a 
speed of reading test (approxi¬ 
mate method for unequal fre¬ 
quencies in subclasses of two 
classifications), 261-265 
application to testing identical twin 
achievement when inequality in 
mental age is eliminated, 235-240 
Analysis of covariance: 
assumptions underlying, 235 
principles of, 216, 235 
process of, 216 
purpose of, 216, 311 
Analysis of variation: 
application of, 211-216 
assignable causes of, 210 
chance causes of, 210 
fundamental problem in, 211 
hypothesis tested in, 213 
role of statistics in, 212 
test of significance in, 213, 214, 216 
Ancillary estimation, 107 
Anderson, R. L., 325 
Arbitrary corrections, 277 
Arithmetic mean, see Mean, arithmetic 
Assumptions, testing of, 17, 31 
in analysis of variance, 212, 226, 280 
in equivalent-form method, 127, 128 
in experimental design, 284 
in ranking, 166, 169, 170 
in sampling, 199 
in split-test method, 127 
underlying most statistical methods, 
155 

underlying product-moment coefficient 
of correlation, 241 

Attitudes, measuring intensity of, 183 

B 

Bacon, Sir Francis, 62 
Bartlett, M. S., 102, 167, 356 
X 2 for multiple classification, 94 
testing homogeneity of variances, 83 
Baxter, Brent, 325 
Bayes’s postulate, 25 
Bayes’s theorem, 24 
Beall, Geoffrey, 356 


369 




370 


INDEX 


Behrens-Fisher test, 73 
Behrens, W. U., 102 
Bernouilli, theorem of, 20 
Beta coefficient: 
measure of kurtosis, 149, 151 
measure of skewness, 149, 151 
sampling distribution of, 151 
Beta function, incomplete, 118 
Bias in: 

estimation, 39, 41, 281 
sampling, 188, 197 
statistical tests, 167 
Binomial distribution, 25, 26 
limiting form of, 27 
moments of, 56, 57, 58 
Biserial phi-coefficient, 146 
Biserial r, 146 
Bishop, D. J., 102 
Bliss, C. I., 168, 356 
probit in testing normality, 160 
transforming ranks, 166 
Boltzmann, L., 5 

Bose, R. C., distribution of Z> 2 , 344, 
356 

Brandt, A. E., 102 

Brandt and Snedecor, method of calculat¬ 
ing x 2 , 94 

C 

Carlson, W. S., 325 
Census of population, 185 
Central Limit Theorem, 27 
Chapin, F. Stuart, 326 
Chi-square, see x 2 
x 2 -distribution: 

combining independent tests by, 170- 
172 

correction for continuity in, 94 
curve of, 42 

in r X c contingency tables, 94 
in 2 X 2 tables, 91, 93 
in testing: 

agreement of observation and hy¬ 
pothesis, 96 

goodness of fit, 37, 39, 42, 45, 48, 51, 
56, 205, 206 

homogeneity of frequency distribu¬ 
tions, 94 

hypotheses, 91, 96 
normality, 149 
principle of classification, 91 
properties of, for estimation, 116 
separating individual degrees of free¬ 
dom in, 95 

Circular triads, in preferences, 177 
Classification, 343 

Cochran-Cox, test for equality of means, 
74-75 


Cochran, W. G., 208, 225, 356 
on correction for continuity, 94, 102 
on log transformation, 163, 168 
on subsampling, 192 
Cohen, J. B., 208 
Collar, A. R., 147 
Complex experiments, 276, 296 
Concordance, coefficient of, 174, 

(see also m-rankings) 
testing significance of, 175 
Confidence coefficient, 111 
Confidence interval: 
compared with fiducial limits, 112 
compared with tolerance limits, 123 
for coefficient of correlation, 141 
for difference in percentages: 
on different samples, 120 
on same sample, 119 
for individual's true score, 117 
for mean, population variance known, 
114 

for median, 117 
for one parameter, 111 
for several parameters, 112 
principle of selection, 111 
theory of, 110-112 
Confidence region, 112 
Consistence, coefficient of, 176 
significance of, 177 
of choices, 180 
Consistent, see Estimation 
Control (s): 

in experimentations, 278 
in purposive sampling, 191 
Cornell, F. G., 143, 208 
Correlation, coefficient of product-mo¬ 
ment: 

combining estimates of, 53 
confidence interval for, 141 
Fisher’s transformation of, 52 
maximum-likelihood estimate of, 123- 
125 

sampling distribution of, 48-52 
standard error of, 51 
tables of (David), 141 
testing assumptions underlying, 241- 
243 

testing significance of, on different 
samples, 53, 86 

testing significance of, on same sample,• 
54, 87 

Correlation coefficient, multiple, 328 
equation of, 329 

testing significance of, 338, 339, 341 
Correlation coefficient, partial, 355 
Correlation intra-class, 230 
Correlation, rank, Spearman’s coefficient 
of, 169 

as a test of significance, 170 



INDEX 


Correlation, rank ( cont ,): 

testing significance of, 170 
Correlation ratio, 147 
for ranked data, 173 
Cowden, D. J., 208 
Cox, G. M., 275, 357 
Craig, A. T., 208 
Criteria: 

of normality, 149, 153 
of optimum estimates, 105 
Critical region, see Statistical hypotheses 
Crump, S. L., 275 
Curtiss, J. H., 164, 168 

D 

/)^statistic, 344 

Darwin, Charles R. and Law of Large 
Numbers, 4 
Day, B. B., 357 
Degrees of freedom: 
geometric interpretation of, 139, 140 
physical interpretation of, 139 
statistical intei pretation of, 140, 141 
Doming, W. E., 208, 209 
on errors in sampling surveys, 197 
Design of experiments: 
modern ideas of, 276 
nature of experimental observations, 
278 

orthogonality in, 295, 308 
randomization in, 282, 286 

test of significance dependent on, 
282 

validity of method of least squares 
dependent on, 284 

relation of statistical analysis to, 282 
necessity of exact tests and analysis 
of variance, 283 

replication, function of, in, 280, 286 
role of statistician in, 285 
self-contained property for, 277 
function of controls, 278 
valid estimate of experimental errors, 
280 

Design of sampling inquiries, 186, 192 
a comparative experiment on sampling 
methods, illustrative of, 202-207 
(See also Sampling) 
method of selecting sample, 203 
stratification proportionate to num¬ 
bers, 204 

stratification proportionate to prod¬ 
uct of numbers and standard 
deviation, 205 

stratification with no restriction, 
203 

statistical aspects in, 193, 199 
Dice, throws with, 21 


371 

Difference of two correlation coefficients: 
sampling distribution of, 53 
test of: 

on different samples, 53, 86 
on same sample, 54, 87 
Difference of two means: 
sampling distribution of: 

with known population variance, 
37 

with unknown population variance, 
47 

test of: 

of correlated measures, 75, 76 
of equal variances, 73-75 
of unequal variances, 80 (see also 
Behrens-Fisher test and Cochran- 
Cox test) 

Difference of two percentages: 
sampling distribution of, 58-59, 165 
test of, 80-81, 120 

Difference of two regression coefficients, 
test of, 90 

Difference of two variances (or standard 
deviations): 

sampling distribution of, 54, 55 
test of, 81, 82 

Differences among set of variances: 
sampling distribution of, 83, 84 
test of: 

Bartlett’s test, 83, 84, 85 
Hartley’s method, 84, 85 
Li-test, 83, 86 

Direct probability, see Probability 
Discriminant function, see Multivariate 
analysis 

Disproportionate class numbers, see 
Analysis of variance, unequal rep¬ 
resentation 

Distribution, curves, 28 
problems of, 104 
Distributions: 
binomial, 25 
polynomial, 26 

simultaneous probability, 124; (see also 
binomial, Poisson, and normal, 25, 
26, 27) 

theoretical, 22 

Doolittle method, see Normal equations 
Duncan, W. J., 147 
Dwyer, P. S., 357 

E 

Eden, T., 225 

Efficiency (see also Estimation): 
of pairing, 80 
of sampling, 192 
Eisenhart, C., 68, 225 
Engelhart, M. D., 326 



372 


INDEX 


Errors (see also Sampling): 
experimental, 280 
of bias, 188 

of first and second kind, 64 
theory of, 27 
Estimates: 
best linear, 193 
large sample, 34 
of a ranking, 175 
optimum, 105 
unbiased, 41 
Estimation: 
bias in, 39, 41, 281 
consistency in, 105 
efficiency in, 105 

interval, 104, 109, 111; (see also Con¬ 
fidence and Fiducial) 
point, 104 
limitations of, 108 
problem of, 15, 104, 193 
analysis of variance in, 275 
sufficiency, 105 

Estimation, method of (see also maximum 
likelihood): 

by minimum x*, 107, 108 
by minimum variance, 107 
by moments, 107 

by principle of unbiased estimates, 108 
Expectation, mathematical, 22, 39, 105, 
193, 228, 346 

F 

Factor analysis (psychology), 353 
Factorial design, see Principles of experi¬ 
mentation 

F-distribution (variance-ratio), 55 
Ferguson, G. A., 162, 168 
Fiducial inference, theory of, 109-110 
Fiducial limits, 109 

compared with confidence interval, 
112 

of an individual's score, 343 
of the mean, population variance un¬ 
known, 114-115 
of the variance, 115-117 
Fiducial probability, 109 
Finney, D. J., 142, 162 
Fisher, E. A., 102, 147, 168,208,225,275, 
285, 326, 357 

analysis of covariance, 216 
analysis of variance, 210 
applications of Student’s distribution, 
61 

design of experiments, 277, 278, 281 
discriminant function, 344 
distribution of x* when parameter 
• estimated from data, 108 
fiducial inference, 109 


Fisher R. A. (cont.): 

^-statistics, 153 

measurement of information, 105 
measures of departure from normality, 
153, 155, 156-157 
table of t , 360 
tables of x*» 361 
^-distribution, 54r-55 
Forecasting, 3 

Four-fold point surface, correlation of, 
146 

Frazer, R. A., 147 

Freedom, degrees of, see Degrees of 
freedom 

Freeman, F. N., 275 

Frequency theory of probability, see 
Probability 

Friedman, M., 172, 183 
G 

gi, a measure of skewness, 153, 154, 161 
02 , a measure of kurtosis, 154, 161 
Gaddum, J. H., 160, 163, 164, 168 
Galton, Sir Francis, 7 
Gamma function, 44 
Garrett, H. E., 326 

Gaussian error curve, see Normal curve 
Gauss, K. F., 28 

Generalized distance of Mahalanobis, 344 
Gibbs, Willard, 5 
Goodness of fit, test of, 63 
Goulden, C. H., 326 

Greco-Latin square, see Principles of 
experimentation 
Guttman, L., 183 

H 

Handy, L. M., 102 
Hansen, M. H., 208, 209 
Harmon, H. H., 353 
Hartley, H. O., 102 
Heterogeneity, condition of, 211 
Holzinger, K. J., 275, 353 
Homogeneity, condition of, 211 
Homogeneity of variance, assumption of, 
211 

Horst, P., 357 

Hotelling, H., 11, 61, 183, 326, 344, 357 . 
Hotelling’s T , see Multivariate analysis 
Houseman, E. E., 325 
Hoyt, C. J., 134, 148 
Hsu, P. L., 102 
Hudelson, Earl, 21 
Hurwitz, W. N., 208 
Hypothesis, role of, in science, 62 
Hypothesis, testing of, see Statistical 
hypothesis 
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I 

Inference, 12, 14, see Statistical hypothe¬ 
sis 

Information: 

in small samples, 105 ' 

invariance as measure of, 105 
relevant, 104 

Intelligence, distribution of, 160 
Interaction, see Analysis of variance 
Intra-class correlation, see Correlation 
Inverse probability, 109 
Item analysis, statistics used in, 147 

J 

Jackson, R. W. B., 148, 275, 357 
test of sensitivity, 129 
Jessen, R. J., 208 
Johnson-Neyman technique, 275 
Johnson, P. O., 168, 275, 326, 355 

K 

^-statistics: 
definition, 154 
general properties, 153 
sampling cumulants of, 154 
Kelley, T. L., 146 

Kendall, M. G., 16, 30, 183, 208, 209 
multiple rankings, 174, 175 
paired comparisons, 176 
randomness, 187 
random sampling, 200 
random sampling numbers, 201 
Kermack, W. O., 208 
Kinsey, A. C., 207 
Kollektiv of von Mises, 19, 25 
Kolmogoroff, A., 30 
probability as abstract ensembles, 20 
Kuder, G. F., 148 

L 

L-tests, 83, 128 

Lagrange’s undetermined multipliers, 219, 
221, 223, 227 
Laplace, 25, 28 

Laplacian-Gaussian error curve, see Nor- 
. mal curve 

Latin square, see Principles of experimen¬ 
tation 

Law: 

binomial, 26 

of chance, 31 

of error, 28 

of large numbers, 27 

of nature as statistical regularity, 4 

of single variable, 277 


Law ( cont .): 

of small numbers, 27 
second law of thermodynamics as a 
statistical, 5 

Least squares, principle of, 28, 176, 284, 
328 

Lehmer, E., 68 
Lew, E. A., 209 

Likelihood-ratio tests of an hypothesis, 
66, 67, 68; (See L-tests) 

Lindquist, E. F., 209, 326 
Linear function, 27, 28 
Linear hypothesis, testing, 244 
Linearity of regression, see Regression 
Linear scale, 155 

Location, estimation of parameters of, 
193 

M 

ra-rankings, the problem of, 174 
McCall, W. A., 168 
McKendrick, A. G., 208 
MacKenzie, W. A., 225 
McNemar, Quinn, 209 
Madhva, K. B., 209 
Madow, L., 209 
Madow, W. G., 209 

Mahalanobis, P. C., 61, 191, 194, 209, 
344, 357 

Markoff, A., 108, 194 
Marks, E. S., 207 
Martin, W., 357 
Maung, K., 357 
Maximum likelihood: 
estimate, 106 

consistency of, 106 
efficiency of, 106 
function of parameters, 106 
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